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PROBABILISTIC LOGICS AND THE SYNTHESIS OF RELIABLE 


ORGANISMS FROM UNRELIABLE COMPONENTS 
By J. von Neumann 


1. INTRODUCTION 

The paper which follows is based on notes taken by R. S. Pierce on five lectures 
given by the author at the California Institute of Technology in January 1952. They 
have been revised by the author, but they reflect, apart from stylistic changes, the 
lectures as they were delivered. The author intends to prepare an expanded version 
for publication, and the present write up, which is imperfect in various ways, does 
therefore not represent the final and complete publication. That will be more detailed 
in several respects, primarily in the mathematical discussions of Sections 9 (p. 30) and 
10 (p. 37) (especially in Sections 10.2 (p. 38) and 10.5.2 (p.43)), and, also in some of 
the parts dealing with logics and with the synthesis of automata. The neurological 
connections may then also be explored somewhat further.* The present write up 
is nevertheless presented in this form because the field is in a state of rapid flux, 
and therefore for ideas that bear on it an exposition without too much delay seems 
desirable. 

The analytical table of contents (p. v) which precedes this will give a reasonably 
close orientation about the contents - indeed the title should be fairly self-explanatory. 
The subject-matter is the role of error in logics, or in the physical implementation of 
logics - in automata-synthesis. Error is viewed, therefore, not as an extraneous and 
misdirected or misdirecting accident, but as an essential part of the process under 
consideration - its importance in the synthesis of automata being fully comparable to 
that one of the factor which is normally considered, the intended and correct logical 
structure. 

Our present treatment of error is unsatisfactory and ad hoc. It is the author’s 
conviction, voiced over many years, that error should be treated by thermodynamical 
methods and be the subject of a thermodynamical theory, as information has been by 
the work of L. Szilard and C. E. Shannon. (Cf. 5.2 (p. 15)). The present treatment 
falls far short of achieving this, but it assembles, it is hoped, some of the building 
materials which will have to enter into the final structure. 

The author wants to express his thanks to K. A. Bruckner and M. Gell-Mann, then 
at the LIniversity of Illinois, to discussions with whom in 1951 he owes some important 
stimuli on this subject; to R. S. Pierce at the California Institute of Technology, on 
whose excellent notes this exposition is based; and to the California Institute of 
Technology whose invitation to deliver these lectures combined with the very warm 
reception by the audience caused him to write this paper in its present form. 


* ed: No evidence has been found that these expansions were ever written. 
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2. A SCHEMATIC VIEW OF AUTOMATA 

2.1 Logics and Automata 

It has been pointed out by A. M. Turing [1] in 1937 and by W. S. McCulloch 
and W. Pitts [2] in 1943 that effectively constructive logics, that is, intuitionistic 
logics, can be best studied in terms of automata. Thus logical propositions can 
be represented as electrical networks or (idealized) nervous systems. Whereas logical 
propositions are built up by combining certain primitive symbols, networks are formed 
by connecting basic components, such as relays in electrical circuits and neurons in 
the nervous system. A logical proposition is then represented as a “black box” which 
has a finite number of inputs (wires or nerve bundles) and a finite number of outputs. 
The operation performed by the box is determined by the rules defining which inputs, 
when stimulated, cause responses in what outputs, just as a propositional function is 
determined by its values for all possible assignments of values to its variables. 

There is one important difference between ordinary logic and the automata which 
represent it. Time never occurs in logic, but every network or nervous system has 
a definite time lag between the input signal and the output response. A definite 
temporal sequence is always inherent in the operation of such a real system. This is 
not entirely a disadvantage. For example, it prevents the occurrence of various kinds 
of more or less overt vicious circles (related to “non-constructivity,” “impredicativity,” 
and the like) which represent a major class of dangers in modern logical systems. It 
should be emphasized again, however, that the representative automaton contains 
more than the content of the logical proposition which it symbolizes — to be precise, 
it embodies a definite time lag. 

Before proceeding to a detailed study of a specific model of logic, it is necessary 
to add a word about notation. The terminology used in the following is taken from 
several fields of science; neurology, electrical engineering, and mathematics furnish 
most of the words. No attempt is made to be systematic in the application of terms, 
but it is hoped that the meaning will be clear in every case. It must be kept in mind 
that few of the terms are being used in the technical sense which is given to them in 
their own scientific field. Thus, in speaking of a neuron we don’t mean the animal 
organ, but rather one of the basic components of our network which resembles an 
animal neuron only superficially, and which might equally well have been called an 
electrical relay. 


2.2 Definitions of the fundamental concepts. 

Externally an automaton is a “black box” with a finite number of inputs and a 
finite number of outputs. Each input and each output is capable of exactly two states, 
to be designated as the “stimulated” state and the “unstimulated” state, respectively. 
The internal functioning of such a “black box” is equivalent to a prescription which 
specifies what outputs will be stimulated in response to the stimulation of any given 
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combination of the inputs, and also of the time of stimulation of these outputs. As 
stated above, it is definitely assumed that the response occurs only after a time lag, 
but in the general case the complete response may consist of a succession of responses 
occurring at different times. This description is somewhat vague. To make it more 
precise it will be convenient to consider first automata of a somewhat restricted type 
and to discuss the synthesis of the general automaton later 

DEFINITION 1: A single output automaton with time delay 8 (8 is positive) is a 
finite set of inputs, exactly one output, and an enumeration of certain “preferred” 
subsets of the set of all inputs. The automaton stimulates its output at time t + 8 
if and only if at time t the stimulated inputs constitute a subset which appears in 
the list of “preferred” subsets describing the automaton. In the above definition the 
expression “enumeration of certain subsets” is taken in its widest sense and does not 
exclude the extreme cases “all” or “none.” If n is the number of inputs, then there 
exist such automata for any given 8. 

Frequently several automata of this type will have to be considered simultane¬ 
ously. They need not all have the same time delay, but it will be assumed that all 
their time lags are integral multiples of a common value do- This assumption may not 
be correct for an actual nervous system; the model considered may apply only to an 
idealized nervous system. In partial justification, it can be remarked that as long as 
only a finite number of automata are considered, the assumption of a common value 
do can be realized within any degree of approximation. Whatever its justification 
and whatever its meaning in relation to actual machines or nervous systems, this 
assumption will be made in our present discussions. The common value c>o is chosen 
for convenience as the time unit. The time variable can now be made discrete, i.e. it 
need assume only integral numbers as values, and correspondingly the time delays of 
the automata considered are positive integers. 

Single output automata with given time delays can be combined into a new au¬ 
tomaton. The outputs of certain automata are connected by lines or wires or nerve 
fibers to some of the inputs of the same or other automata. The connecting lines 
are used only to indicate the desired connections; their function is to transmit the 
stimulation of an output instantaneously to all the inputs connected with that out¬ 
put. The network is subjected to one condition, however. Although the same output 
may be connected to several inputs, any one input is assumed to be connected to 
at most one output. It may be clearer to impose this restriction on the connecting 
lines, by requiring that each input and each output be attached to exactly one line, 
to allow lines to be split into several lines; but prohibit the merging of two or more 
lines. This convention makes it advisable to mention again that the activity of an 
output or an input, and hence of line, is an all or nothing process. If a line is split, 
the stimulation is carried to all the branches in full. No energy conservation laws 
enter into the problem. In actual machines or neurons, the energy is supplied by the 
neurons themselves from some external source of energy. The stimulation acts only 
as a trigger device. 

The most general automaton is defined to be any such network. In general it will 
have several inputs and several outputs and its response activity will be much more 
complex than that of a single output automaton with a given time delay. An intrinsic 
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definition of the general automaton, independent of its construction as a network, 
can be supplied. It will not be discussed here, however. 

Of equal importance to the problem of combining automata into new ones is 
the converse problem of representing a given automaton by a network of simpler 
automata, and of determining eventually a minimum number of basic types for these 
simpler automata. As will be shown, very few types are necessary. 


2.3 Some basic organs. 

The automata to be selected as a basis for the synthesis of all automata will be 
called basic organs. Throughout what follows, these will be single output automata. 

One type of basic organ is described by Figure 1. It has one output, and may 
have any finite number of inputs. These are grouped into two types: Excitatory and 
inhibitory inputs. The excitatory inputs are distinguished from the inhibitory inputs 
by the addition of an arrowhead to the former and of a small circle to the latter. 
This distinction of inputs into two types does actually not relate to the concept of 
inputs, it is introduced as a means to describe the internal mechanism of the neuron. 
This mechanism is fully described by the so-called threshold function $( 2 ;) written 
inside the large circle symbolizing the neuron in Figure 1, according to the following 
convention: The output of the neuron is excited at time t + 1 if and only if at time t 
the number of stimulated excitatory h inputs and the number of stimulated inhibitory 
inputs £ satisfy the relation h > $(£). (It is reasonable to require that the function 
$( 2 ;) be monotone non-decreasing.) For the purposes of our discussion of this subject 
it suffices to use only certain special classes of threshold functions $( 2 ;). E.g. 


Fig. 1 


(i) 



(i.e. < h inhibitions are absolutely ineffective, > h inhibitions are absolutely effective), 
or 


(2) $( 2 :) = Xh( x ) = x + h 

(i.e. the excess of stimulations over inhibitions must be > h). We will use and 
write the inhibition number h (instead of X^) inside the large circle symbolizing 
the neuron. Special cases of this type are the three basic organs shown in Figure 
2. These are, respectively, a threshold two neuron with two excitatory inputs, a 
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threshold one neuron with two excitatory inputs, and finally a threshold one neuron 
with one excitatory input and one inhibitory input. 

The automata with one output and one input described by the networks shown 
in Figure 3 have simple properties: The first one’s output is never stimulated, the 
second one’s output is stimulated at all times if its input has been ever (previously) 
stimulated. Rather than add these automata to a network, we shall permit lines 
leading to an input to be either always non-stimulated, or always stimulated. We call 
the latter “grounded” and designate it by the symbol 11 1 — and we call the former 
“live” and designate it by the symbol | 1 1 1 —. 


3. AUTOMATA AND THE PROPOSITIONAL CALCULUS 

3.1 The Propositional Calculus 

The propositional calculus deals with propositions irrespective of their truth. The 
set of propositions is closed under operations of negation, conjunction and disjunction. 
If a is a proposition, then “not a”, denoted by a -1 (we prefer this designation to the 
more conventional ones — a and ~ a), is also a proposition. If a, b are two propositions, 
then “a and 6”, “a or 6”, denoted respectively by ab, a + b, are also propositions. 
Propositions fall into two sets, T and F. depending whether they are true or false. 
The proposition a -1 is in T if and only if a is in F. The proposition ab is in T 
if and only if a and b are both in T, and a + b is in T if and only if either a or 
b is in T. Mathematically speaking the set of propositions, closed under the three 
fundamental operations, is mapped by a homomorphism onto the Boolean algebra of 
the two elements 1 and 0. A proposition is true if and only if it is mapped onto the 
element 1. For convenience, denote by 1 the proposition a + a 1 , by 0 the proposition 
aa 1 , where a is a fixed but otherwise arbitrary proposition. Of course, 0 is false and 
1 is true. 

A polynomial V in n variables, n > 1, is any formal expression obtained from 
xi,... ,x n by applying the fundamental operations to them a finite number of times, 
for example [(op + x^^x^ 1 is a polynomial. In the propositional calculus two 
polynomials in the same variables are considered equal if and only if for any choice 
of the propositions x\,... ,x n the resulting two propositions are always either both 
true or both false. A fundamental theorem of the propositional calculus states that 
every polynomial V is equal to 


E \ f. . ~*1 . . . rJn 

••• / Jll...ln -'1 X n 

* 1=±1 in=i 1 


where each of the is equal to 0 or 1. Two polynomials are equal if and only if 

their /’s are equal. In particular, for each n, there exist exactly 2polynomials. 
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3.2 Propositions, automata and delays. 


These remarks enable us to describe the relationship between automata and the 
propositional calculus. Given a time delay s, there exists a one-to-one correspondence 
between single output automata with time delay s and the polynomials of the propo¬ 
sitional calculus. The number n of inputs (to be designated v — 1,..., n) is equal to 
the number of variables. For every combination i\ = ±1,..., i n = ±1, the coefficient 
— I, if and only if a stimulation at time t of exactly those inputs v for which 
i v — 1, produces a stimulation of the output at time t + s. 

DEFINITION 2: Given a polynomial V = V(xi ,..., x n ) and a time delay s, we 
mean by a V, s -network a network built from the three basic organs of Figure 2 
(p. 6), which as an automaton represents V with time delay s. 

THEOREM 1: Given any V. there exists a (unique) s* = s*(V) : such that a V,s 
-network exists if and only if s > s*. 

PROOF: Consider a given V. Let S(V) be the set of those s for which a V, s -network 
exists. If s' > s , then tying s' — s unit-delays, as shown in Figure 4 (p. 6), in series to 
the output of a V, s -network produces a V. s' -network. Hence S(V) contains with 
an s all s' > s. Hence if S(V) is not empty, then it is precisely the set of all s > s *, 
where s* = s*(V) is its smallest element. Thus the theorem holds for V if S(V) is not 
empty, i.e. if the existence of at least one V. s -network (for, some s !) is established. 


32>— =3D— =3D 


Fig. 2 


<=XD— . _ l ~~X D 

Fig. 3 



Fig. 4 
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Now the proof can be effected by induction over the number g = g(V) of symbols 
used in the definitory expression for V (counting each occurrence of each symbol 
separately). 


If g(V) = 1, then V(xi, ..., x n ) = x v (for one of the v — 1,..., n). The “trivial” 
network which obtains by breaking off all input lines other than u, and taking the 
input line v directly to the output, solves the problem with s = 0. Hence s*{V) = 0. 

If g(V) > 1, then V = Q~ l or V = Q77 or V = Q + 77, where g(Q), £>(77) < g(V). 
For V = Q^ 1 let the box _Q_ represent a Q, s' -network, with s' = s*(Q). Then the 


network shown in Figure 5 (p. 7) is clearly a V, s -network, with s = s' + 1. Hence 


s*(V) < s*(<2) + 1. For V = QJZ or Q + 77 let the boxes 


Q 


77 represent a Q, s" 

-network and an 77, s" -network, respectively, with s" = max(s*(Q), s*(77)). Then the 
network shown in Figure 6 (p. 7) is clearly a V, s -network, with V = Q7Z or Q + 77 for 
h — 2 or 1, respectively, and with s = s" + 1. Hence s*(V) < max(s*(Q), s*(77)) + 1. 


Combine the above theorem with the fact, that every single output automaton 
can be equivalently described - apart from its time delay s — by a polynomial V, and 
that the basic operations ab, a + b, a~ 1 of the propositional calculus are represented 
(with unit delay) by the basic organs of Figure 2 (p.6). (For the last one, which 
represents a6 -1 , cf. the remark at the beginning of 4.1.1 (p. 9).) This gives: 


DEFINITION 3: Two single output automata are equivalent in the wider sense, if 
they differ only in their time delays — but otherwise the same input stimuli produce 
the same output stimulus (or non-stimulus) in both. 

THEOREM 2 (Reduction Theorem): 


Any single output automaton $ is equivalent in the wider sense to a network of 
basic organs of Figure 2 (p. 6). There exists a (unique) s = s*(i!)), such that the latter 
network exists if and only if its prescribed time delay s satisfies s > s*. 
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3.3 Universality. General logical considerations. 

Now networks of arbitrary single output automata can be replaced by networks 
of basic organs of Figure 2 (p. 6): It suffices to replace the unit delay in the former 
system by s unit delays in the latter, where s is the maximum of the s*(i9) of all the 
single output automata that occur in the former system. Then all delays that will 
have to be matched will be multiples of s, hence > s, hence > s*(i9) for all d that can 
occur in this situation, and so the Reduction Theorem will be applicable throughout. 

Thus this system of basic organs is universal: It permits the construction of 
essentially equivalent networks to any network that can be constructed from any 
system of single output automata. I.e. no redefinition of the system of basic organs 
can extend the logical domain covered by the derived networks. 

The general automaton is any network of single output automata in the above 
sense. It must be emphasized that, in particular, feedbacks, i.e. arrangements of lines 
which may allow cyclical stimulation sequences, are allowed. (I.e. configurations like 
those shown in Figure 7 (p. 9), etc. There will be various, non-trivial, examples of this 
later.) The above arguments have shown that a limitation of the underlying single 
output automata to our original basic organs causes no essential loss of generality. 
The question as to which logical operations can be equivalently represented (with 
suitable, but not a priori specified delays) is nevertheless not without difficulties. 

These general automata are, in particular, not immediately equivalent to all of 
effectively constructive (intituitionistic) logics. I.e. given a problem involving (a finite 
number of) variables, which can be solved (identically in these variables) by effective 
construction, it is not always possible to construct a general automaton that will 
produce this solution identically (i.e under all conditions). The reason for this is 
essentially, that the memory requirements of such a problem may depend on (actual 
values assumed by) the variables (i.e. they must be finite for any specific system of 
values of the variables, but they may be unbounded for the totality of all possible 
systems of values), while a general automaton in the above sense, necessarily has a 
fixed memory capacity. I.e. a fixed general automaton can only handle (identically, 
i.e. generally) a problem with fixed (bounded) memory requirements. 

We need not go here into the details of this question. Very simple addenda 
can be introduced to provide for a (finite but) unlimited memory capacity. How 
this can be done has been shown by A. M. Turing [1]. Turing’s analysis loc. cit. 
also shows that with such addenda general automata become strictly equivalent to 
effectively constructive (intuitionistic) logics. Our system in its present form (i.e. 
general automata with limited memory capacity) is still adequate for the treatment 
of all problems with neurological analogies, as our subsequent examples will show. 
(Cf. also W. S. McCulloch and W. Pitts [2].) The exact logical domain that they 
cover has been recently characterized by Kleene [3]. We will return to some of these 
questions in Section 5.1 (p. 14) 
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4. BASIC ORGANS. 


4.1 Reduction of the basic components. 


4.1.1 The simplest reductions. 

The previous Section makes clear the way in which the elementary neurons should 
be interpreted logically. Thus the ones shown in Figure 2 (p. 6) respectively represent 
the logical functions ab , a + b, and ab _1 . In order to get b [ . it suffices to make 
the a-ter mi nal of the third organ, as shown in Figure 8 (p. 10), live. This will be 
abbreviated in the following, as shown in Figure 8 (p. 10). 

Now since ab = ((a” 1 ) + (6 _1 )) _1 and a + b = ((a _1 )(& _1 )) _1 , it is clear that 
the first organ among the three basic organs shown in Figure 2 (p. 6) is equivalent to 
a system built of the remaining two organs there, and that the same is true for the 
second organ there. Thus the first and second organs shown in Figure 2 (p. 6) are 
respectively equivalent (in the wider sense) to the two networks shown in Figure 9 
(p. 10). This tempts one to consider a new system, in which —C<3— (viewed as a 
basic entity in its own right, and not an abbreviation for a composite, as in Figure 
8 (p. 10)), and either the first or the second basic organ in Figure 2 (p.6), are the 
basic organs. They permit forming the second or the first basic organ in Figure 2 
(p. 6), respectively, as shown above, as (composite) networks. The third basic organ in 
Figure 2 (p. 6) is easily seen to be also equivalent (in the wider sense) to a composite 
of the above but, as was observed at the beginning of 4.1.1 (p. 9) the necessary 
organ is in any case not this, but —cQ— (cf. also the remarks concerning Figure 8 
(p. 10)), respectively. Thus either system of new basic organs permits reconstructing 
(as composite networks) all the (basic) organs of the original system. It is true, that 
these constructs have delays varying from 1 to 3, but since unit delays, as shown in 
Figure 4 (p. 6), are available in either new system, all these delays can be brought up 
to the value 3. Then a trebling of the unit delay time obliterates all differences. 

To restate: Instead of the three original basic organs, shown again in Figure 10 
(p. 10), we can also (essentially equivalently) use the two basic organs Nos. one and 
three or Nos. two and three in Figure 10 (p. 10). 
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4.1.2 The double line trick. 


This result suggests strongly that one consider the one remaining combination, 
too: The two basic organs Nos. one and two in Figure 10 (p. 10), as the basis of an 
essentially equivalent system. 

One would be inclined to infer that the answer must be negative: No network 
built out of the two first basic organs of Figure 10 (p. 10) can be equivalent (in the 
wider sense) to the last one. Indeed, let us attribute to T and F. i.e. to the stimulated 
or non-stimulated state of a line, respectively, the “truth values” 1 or 0, respectively. 
Keeping the ordering 0 < 1 in mind, the state of the output is a monotone non¬ 
decreasing function of the states of the inputs for both basic organs Nos. one and two 
in Figure 10 (p. 10), and hence for all networks built from these organs exclusively as 
well. This, however is not the case for the last organ of Figure 10 (p. 10) (nor for the 
last organ of Figure 2 (p. 6)), irrespectively of delays. 

Fig. s —- = 6 —<D—- 


Fig. 9 
Fig. 10 



Nevertheless a slight change of the underlying definitions permits one to circum¬ 
vent this difficulty and to get rid of the negation (the last organ of Figure 10 (p. 10)) 
entirely. The device which effects this is of additional methodical interest, because 
it may be regarded as the prototype of one that we will use later on in a more com¬ 
plicated situation. The trick in question is to represent propositions on a double line 
instead of single one. One assumes that of the two lines, at all times precisely one is 
stimulated. Thus there will always be two possible states of the line pair: The first 
line stimulated, the second non-stimulated; and the second line stimulated, the first 
non-stimulated. We let one of these states correspond to the stimulated single line 
of the original system — that is, to a true proposition — and the other state to the 
unstimulated single line — that is, to a false proposition. Then the three fundamental 
Boolean operations can be represented by the three first schemes shown in Figure 11 
(p. 12). (The last scheme shown in Figure 11 relates to the original system of Figure 

2 (P-6).) 

In these diagrams, a true proposition corresponds to 1 stimulated, 2 unstimulated, 
while a false proposition corresponds to 1 unstimulated, 2 stimulated. The networks 
of Figure 11 (p. 12), with the exception of the third one, have also the correct delays: 
Unit delay. The third one has zero delay, but whenever this is not wanted, it can be 
replaced by unit delay; by replacing the third network by the fourth one, making its 
al -line live, its a2 -line grounded, and then writing a for its b. 
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Summing up: Any two of the three (single delay) organs of Figure 10 (p. 10) — 
which may simply be designated by ab, a + b, a -1 — can be stipulated to be the basic 
organs, and yield a system that is essentially equivalent to the original one. 


4.2 Single basic organs. 


4.2.1 The Scheffer stroke. 

It is even possible to reduce the number of basic organs to one, although it cannot 
be done with any of the three organs enumerated above. We will, however, introduce 
two new organs, either of which suffices by itself. 

The first universal organ corresponds to the well-known “Scheffer stroke” function. 
Its use in this context was suggested by K. A. Bruckner and M. Gell-Mann. In 
symbols, it can be represented (and abbreviated) as shown on Figure 12 (p. 12). The 
three fundamental Boolean operations can now be performed as shown in Figure 13 
(P-13). 

The delays are 2, 2, 1, respectively, and in this case the complication caused by 
these delay-relationships is essential. Indeed, the output of the Scheffer-stroke is an 
antimonotone function of its inputs. Hence in every network derived from it, even- 
delay outputs will be monotone functions of its inputs, and anti-delay outputs will be 
antimonotone ones. Now ab and a + b are not antimonotone, and ab~ l and a -1 are 
not monotone. Hence no delay-value can simultaneously accommodate in this set up 
one of the two first organs and one of the two last organs. 

The difficulty can, however, be overcome as follows: ab and a + b are represented 
in Figure 13 (p. 13), both with the same delay, namely 2. Hence our earlier result 
(in 4.1.2 (p. 9)), securing the adequacy of the system of the two basic organs ab and 
a + b applies: Doubling the unit delay time reduces the present set up (Scheffer stroke 
only!) to the one referred to above. 


4.2.2 The majority organ. 

The second universal organ is the “majority organ.” In symbols, it is shown (and 
alternatively designated) in Figure 14 (p. 13). To get conjunction and disjunction 
is a simple matter, as shown in Figure 15 (p. 13). Both delays are 1. Thus ab and 
a + b (according to Figure 10 (p. 10)) are correctly represented, and the new system 
(majority organ only!) is adequate because the system based on those two organs is 
known to be adequate (cf. 4.1.2 (p. 9). 
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5. LOGICS AND INFORMATION. 


5.1 Intuitionistic logics. 

All of the examples which have been described in the last two Sections have had a 
certain property in common; in each, a stimulus of one of the inputs at the left could 
be traced through the machine until at a certain time later it came out as a stimulus 
of the output on the right. To be specific, no pulse could ever return to a neuron 
through which it had once passed. A system with this property is called circle-free 
by W. S. McCulloch and W. Pitts [2], While the theory of circle-free machines is 
attractive because of its simplicity, it is not hard to see that these machines are very 
limited in their scope. 

When the assumption of no circles in the network is dropped, the situation is 
radically altered. In this far more complicated case, the output of the machine at 
any time may depend on the state of the inputs in the indefinitely remote past. 
For example, the simplest kind of cyclic circuit, as shown in Figure 16 (p. 14), is a 
kind of memory machine. Once this organ has been stimulated by a, it remains 



& 


stimulated and sends forth a pulse in b at all times thereafter. With more compli¬ 
cated networks, we can construct machines which will count, which will do simple 
arithmetic, and which will even perform certain unlimited inductive processes. Some 
of these will be illustrated by examples in Section 6 (p. 16). The use of cycles or 
feedback in automata extends the logic of constructable machines to a large portion 
of intuitionistic logic. Not all of intuitionistic logic is so obtained, however, since 
these machines are limited by their fixed size. (For this and for the remainder of this 
Section; cf. also the remarks at the end of Section 3.3 (p. 8).) Yet if our automata 
are furnished with an unlimited memory — for example an infinite tape, and scanners 
connected to afferent organs, along with suitable efferent organs to perform motor 
operations and/or print on the tape — the logic of constructable machines becomes 
precisely equivalent to intuitionistic logic (see A. M. Turing [1]). In particular, all 
numbers computable in the sense of Turing can be computed by some such network. 
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5.2 Information. 

5.2.1 General observations. 

Our considerations deal with varying situations, each of which contains a certain 
amount of information. It is desirable to have a means of measuring that amount. In 
most cases of importance, this is possible. Suppose an event is one selected from a 
finite set of possible events. Then the number of possible events can be regarded as 
a measure of the information content of knowing which event occurred, provided all 
events are a priori equally probable. However, instead of using the number n of possi¬ 
ble events as the measure of information, it is advantageous to use a certain function 
of n, namely the logarithm. This step can be (heuristically) justified as follows: If two 
physical systems I and II represent n and m (a priori equally probable) alternatives, 
respectively, then the union I+II represents nm such alternatives. Now it is desir¬ 
able that the (numerical) measure of information be (numerically) additive under this 
(substantively) additive composition I+II. Hence some function /(n) should be used 
instead of n, such that 

(3) f(nm) = f(n) + f(m). 

In addition, for n > m I represents more information than II, hence it is reasonable 
to require 

(4) n > m implies f(n) > f(m). 

Note, that f(n) is defined for n — 1, 2,... only. From (3), (4) one concludes easily, 
that 

(5) f(n)=C\nn 

for some constant C > 0. (Since f(n) is defined for n — 1, 2,... only, (3) alone does 
not imply this, even not with a constant CjO 1) Next, it is conventional to let the 
minimum non-vanishing amount of information, i.e. that which corresponds to n — 2, 
be the unit of information — the “bit.” This means that /(2) = 1, i.e. C = l/log2. 
and so 

(6) f(n) = 2 log n* 

This concept of information was successively developed by several authors in the late 
1920’s and early 1930’s, and finally integrated into a broader system by C. E. Shannon 
([4] part 1). 


5.2.2 Examples 

The following simple examples give some illustration: The outcome of the flip 
of a coin is one bit. That of the roll of a die is 2 log 6 ~ 2.5 bits. A decimal digit 

)(c c\ 

ed: log here and in subsequent cases is now more usually written as log 2 
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represents 2 log 10 ~ 3.3 bits, a letter of the alphabet represents 2 log 26 ~ 4.7 bits, a 
single character from a 44-key 2-setting typewriter represents 2 log(44 x 2) = 6.5 bits. 
(In all these we assume, for the sake of the argument, although actually unrealistically, 
a priori equal probability of all possible choices.) It follows that any line or nerve 
fiber which can be classified as either stimulated or non-stimulated carries precisely 
one bit of information, while a bundle of n such lines can communicate n bits. It 
is important to observe that this definition is possible only on the assumption that 
a background of a priori knowledge exists, namely, the knowledge of a system of a 
priori equally probable events. 

This definition can be generalized to the case where the possible events are not all 
equally probable. Suppose the events are known to have probabilities pi,P 2 , ■ ■ ■ ,Pn- 
Then the information contained in the knowledge of which of these events actually 
occurs, is defined to be 

n 

(7) H = pi 2 log pi (bits). 

i =1 

In case p\ = P 2 = • • • = p n = 1/n, this definition is the same as the previous one. 
This result, too, was obtained by C. E. Shannon [4] part 1, although it is implicit in 
the earlier work of L. Szilard [5]. 

An important observation about this definition is that it bears close resemblance 
to the statistical definition of the entropy of a thermodynamical system. If the possible 
events are just the known possible states of the system with their corresponding 
probabilities, then the two definitions are identical. Pursuing this, one can construct 
a mathematical theory of the communication of information patterned after statistical 
mechanics. (See L. Szilard [5] and C. E. Shannon [4] part 1.) That information theory 
should thus reveal itself as an essentially thermodynamical discipline, is not at all 
surprising: The closeness and the nature of the connection between information and 
entropy is inherent in L. Boltzmann’s classical definition of entropy (apart from a 
constant, dimensional factor) as the logarithm of the “configuration number.” The 
“configuration number” is the number of a priori equally probable states that are 
compatible with the macroscopic description of the state — i.e. it corresponds to the 
amount of (microscopic) information that is missing in the (macroscopic) description. 


6. TYPICAL SYNTHESES OF AUTOMATA. 

6.1 The memory unit. 

One of the best ways to become familiar with the ideas which have been intro¬ 
duced, is to study some concrete examples of simple networks. This Section is devoted 
to a consideration of a few of them. 

The first example will be constructed with the help of the three basic organs of 
Figure 10 (p. 10). It is shown in Figure 18 (p. 17). It is a slight rearrangement of the 
primitive memory network of Figure 16 (p. 14). 
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This network has two inputs a and b and one output x. At time t. x is stimulated 
if and only if a has been stimulated at an earlier time, so that no stimulation of b 
has occurred since then. Roughly speaking, the machine remembers whether a or b 
was the last input to be stimulated. Thus x is stimulated if it has been stimulated 
immediately before — to be designated by x' — or if a has been stimulated immediately 
before, but b has not been stimulated immediately before. This is expressed by the 
formula x — (pc' + a)5 _1 , i.e. by the network shown in Figure 17 (p. 17). Now x 
should be fed back into x' (since x' is the immediately preceding state of x). This 
gives the network shown in Figure 18 (p. 17), where this branch of x is designated by 
y. However, the delay of the first network is 2, hence the second network’s memory 
extends over past events that lie an even number of time (delay) units back. I.e. the 
output x is stimulated if and only if a has been stimulated at an earlier time, an even 
number of units before, so that no stimulation of b has occurred since then, also an 
even number of units before. Enumerating the time units by an integer f, it is thus 
seen that this network represents a separate memory for even and for odd t. For each 
case it is a simple “off-on,” i.e. one bit, memory. Thus it is in its entirety a two bit 
memory. 




6.2 Scalers. 

In the examples that follow, free use will be made of the general family of basic 
organs considered in 2.3 (p. 4), at least for all p = Xh (cf. eq. (2)(p. 4)). The reduction 
thence to elementary organs in the original sense is secured by the Reduction Theorem 
in 3.2 (p. 7), and in the subsequently developed interpretations, according to Section 
4 (p. 9), by our considerations there. It is therefore unnecessary to concern ourselves 
here with these reductions. 

The second example is a machine which counts input stimuli by twos. It will be 
called a “scaler by two.” Its diagram is shown in Figure 19 (p. 18). 

By adding another input, the repressor, the above mechanism can be turned off 
at will. The diagram becomes as shown in Figure 20 (p. 18). The result will be called 
a “scaler by two” with a repressor and denoted as indicated by Figure 20 (p. 18). 

In order to obtain larger counts, the “scaler by two” networks can be hooked in 
series. Thus a “scaler by 2 n ” is shown in Figure 21 (p. 18). The use of the repressor 
is of course optional here. “Scalers by m" where m is not necessarily of the form 2 n , 
can also be constructed with little difficulty, but we will not go into this here. 
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6.3 Learning. 

Using these “scalers by 2 n ” (i.e. n-stage counters), it is possible to construct 
the following sort of “learning device.” This network has two inputs a and b. It is 
designed to learn that whenever a is stimulated, then, in the next instant, b will be 
stimulated. If this occurs 256 times (not necessarily consecutively and possibly with 
many exceptions to the rule), the machine learns to anticipate a pulse from b one 
unit of time after a has been active, and expresses this by being stimulated at its 
b output after every stimulation of a. The diagram is shown in Figure 22 (p.20). 
(The“ expression” described above will be made effective in the desired sense by the 
network of Figure 24 (p. 20), cf. its discussion below). 

This is clearly learning in the crudest and most inefficient way, only. With some 
effort, it is possible to refine the machine so that, first, it will learn only if it receives 
no counter-instances of the pattern “6 follows a” during the time when it is collecting 
these 256 instances; and, second having once learned, the machine can unlearn by 
the occurrence of 64 counter-examples to “6 follows a” if no (positive) instances of 
this pattern interrupt the (negative) series. Otherwise, the behavior is as before. The 
diagram is shown in Figure 23 (p.20). To make this learning effective, one has to 
use x to gate a so as to replace b at its normal functions. Let these be represented 
by an output c. Then this process is mediated by the network shown in Figure 24 
(p. 20)). This network must then be attached to the lines a, b and to the output x of 
the preceding network (according to Figures 22, 23 (p. 20)). 
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7. THE ROLE OF ERROR. 

7.1 Exemplification with the help of the memory unit. 

In all the previous considerations, it has been assumed that the basic components 
were faultless in their performance. This assumption is clearly not a very realistic 
one. Mechanical devices as well as electrical ones are statistically subject to failure, 
and the same is probably true for animal neurons too. Hence it is desirable to find 
a closer approximation to reality as a basis for our constructions, and to study this 
revised situation. The simplest assumption concerning errors is this: With every basic 
organ is associated a positive number such that in any operation, the organ will fail 
to function correctly with the (precise) probability e. This malfunctioning is assumed 
to occur statistically independently of the general state of the network and of the 
occurrence of other malfunctions. A more general assumption, which is a good deal 
more realistic, is this: The malfunctions are statistically dependent on the general 
state of the network and on each other. In any particular state, however, a malfunction 
of the basic organ in question has a probability of malfunctioning which is < e. For 
the present occasion, we make the first (narrower and simpler) assumption, and that 
with a single e: Every neuron has statistically independently of all else exactly the 
probability e of misfiring. Evidently, it might as well be supposed e < E since an 
organ which consistently misbehaves with a probability > ~, is just behaving with 
the negative of its attributed function, and a (complementary) probability of error 
< 7 Indeed, if the organ is thus redefined as its own opposite, its e (> |) goes then 
over into 1 — e (< |). In practice it will be found necessary to have e a rather small 
number, and one of the objectives of this investigation is to find the limits of this 
smallness, such that useful results can still be achieved. 

It is important to emphasize that the difficulty introduced by allowing error is 
not so much that incorrect information will be obtained, but rather that irrelevant 
results will be produced. As a simple example, consider the memory organ of Figure 
16 (p. 14). Once stimulated, this network should continue to emit pulses at all later 
times; but suppose it has the probability e of making an error. Suppose the organ 
receives a stimulation at time t and no later ones. Let the probability that the organ 
is still excited after s cycles be denoted p s . Then the recursion formula 

Ps +1 = (1 - e)Ps + e(l - p a ) 
is clearly satisfied. This can be written 

Ps+i ~ 1/2 = (1 — 2e)(p s — 1/2) 

and so 

(8) p s - 1/2 = (1 - 2e) s (p 0 - 1/2) » e~ 2es (p 0 - 1/2) 

for small e. The quantity p s — 1/2 can be taken as a rough measure of the amount 
of discrimination in the system after the s-th cycle. According to the above formula, 
p s — * 1/2 as s — * oo — a fact which is expressed by saying that, after a long time, 
the memory content of the machine disappears, since it tends to equal likelihood of 
being right or wrong, i.e. to irrelevancy. 
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7.2 The general definition. 

This example is typical of many. In a complicated network, with long stimulus- 
response chains, the probability of errors in the basic organs makes the response of 
the final outputs unreliable, i.e. irrelevant, unless some control mechanism prevents 
the accumulation of these basic errors. We will consider two aspects of this problem. 
Let the data be these: The function which the automaton is to perform is given; a 
basic organ is given (Scheffer stroke, for example); a number e (< |), which is the 
probability of malfunctioning of this basic organ, is prescribed. The first question is: 
Given 5 > 0, can a corresponding automaton be constructed from the given organs, 
which will perform the desired function and will commit an error (in the final result, 
i.e. output) with probability <57 How small can 5 be prescribed? The second 
question is: Are there other ways to interpret the problem which will allow us to 
improve the accuracy of the result? 


7.3 An apparent limitation. 

In partial answer to the first question, we notice now that 5, the prescribed max¬ 
imum allowable (final) error of the machine, must not be less than e. For any output 
of the automaton is the immediate result of the operation of a single final neuron and 
the reliability of the whole system cannot be better than the reliability of this last 
neuron. 


Each grtmp = represents 
a huddle of N lines 


Fig, 25 

7.4 The multiple line trick. 

In answer to the second question, a method will be analyzed by which this thresh¬ 
old restriction 5 > e can be removed. In fact we will be able to prescribe 5 arbitrarily 
small (for suitable, but fixed, e). The trick consists in carrying all the messages si¬ 
multaneously on the bundle of N lines ( iV is a large integer) instead of just a single 
or double strand as in the automata described up to now. An automaton would then 
be represented by a black box with several bundles of inputs and outputs, as shown 
in Figure 25 (p. 22). Instead of requiring that all or none of the lines of the bundle be 
stimulated, a certain critical (or fiduciary) level A is set: 0 < A < The stimulation 
of > (1 — A )N lines of a bundle is interpreted as a positive state of the bundle. The 
stimulation of < AN lines is considered as a negative state. All levels of stimulation 
between these values are intermediate or undecided. It will be shown that by suit¬ 
ably constructing the automaton, the number of lines deviating from the “correctly 
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functioning” majorities of their bundles can be kept at or below the critical level 
AN (with arbitrarily high probability). Such a system of construction is referred to 
as “multiplexing.” Before turning to the multiplexed automata, however, it is well 
to consider the ways in which error can be controlled in our customary single line 
networks. 


8. CONTROL OF ERROR IN SINGLE LINE AUTOMATA 

8.1 The simplified probability assumption. 

In 7.3 (p. 22) it was indicated that when dealing with an automaton in which 
messages are carried on a single (or even a double) line, and in which the components 
have a definite probability e of making an error, there is a lower bound to the accuracy 
of the operation of the machine. It will now be shown that it is nevertheless possible 
to keep the accuracy within reasonable bounds by suitably designing the network. 
For the sake of simplicity only circle-free automata (cf. 5.1 (p. 14)) will be considered 
in this Section, although the conclusions could be extended, with proper safeguards, 
to all automata. Of the various essentially equivalent systems of basic organs (cf. 
Section 4 (p. 9)) it is, in the present instance, most convenient to select the majority 
organ, which is shown in Figure 14 (p. 13), as the basic organ for our networks. The 
number e(0 < e < 1/2) will denote the probability each majority organ has for 
malfunctioning. 

8.2 The majority organ. 

We first investigate upper bounds for the probability of errors as impulses pass 
through a single majority organ of a network. Three lines constitute the inputs of 
the majority organ. They come from other organs or are external inputs of the 
network. Let m-,m-,m be three numbers (0 < r/i < 1), which are respectively upper 
bounds for the probabilities that these lines will be carrying the wrong impulses. 
Then e + r/i + r /2 + m is an upper bound for the probability that the output line 
of the majority organ will act improperly. This upper bound is valid in all cases. 
Under proper circumstances it can be improved. In particular, assume: (i) The 
probabilities of errors in the input lines are independent, (ii) under proper functioning 
of the network, these lines should always be in the same state of excitation (either all 
stimulated, or all unstimulated). In this latter case 

© = mm + mm + V2V3 - ^mmm 

is an upper bound for at least two of the input lines carrying the wrong impulses, 
and thence 

8 = (1 - e)0 + e(l - 0) = e + (1 - 2e)0 

is a smaller upper bound for the probability of failure in the output line. If all rji < 77 , 
then e + 3r] is a general upper bound, and 


e + (1 — 2e)(3rj 2 — 2 rf) < e + 3 rj 2 
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is an upper bound for the special case. Thus it appears that in the general case each 
operation of the automaton increases the probability of error since e + 3r/ > rj, so 
that if the serial depth of the machine (or rather of the process to be performed) is 
very great, it will be impractical or impossible to obtain any kind of accuracy. In the 
special case, on the other hand, this is not necessarily so — e + 3rj 2 < r/ is possible. 
Hence, the chance of keeping the error under control lies in maintaining the conditions 
of the special case throughout the construction. We will now exhibit a method which 
achieves this. 

8.3 Synthesis of Automata. 


8.3.1 The heuristic argument. 

The basic idea in this procedure is very simple. Instead of running the incoming 
data into a single machine, the same information is simultaneously fed into a number 
of identical machines, and the result that comes out of a majority of these machines 
is assumed to be true. It must be shown that this technique can really be used to 
control error. 

Denote by O the given network (assume two outputs in the specific instance 
pictured in Figure 26 (p. 24)). Construct O in triplicate, labeling the copies O 1 , O 2 , O 3 
respectively. Consider the system shown in Figure 26 (p. 24). 



Fig. Zh 


For each of the final majority organs the conditions of the special case considered 
above obtain. Consequently, if i) is an upper bound for the probability of error at any 
output of the original network O, then 


(9) 


rf = e+ (1 - 2e)(3i) 2 - 2tf) = 


is an upper bound for the probability of error at any output of the new network O*. 
The graph is the curve r]* = / e (7y), shown in Figure 27 (p. 25). 
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Fig, 27 


Consider the intersections of the curve with the diagonal rf — rj : First, rj = 
at any rate such an intersection. Dividing 77 — f e (v) by rj — ^ gives 2((1 — 2e)?] 2 — 
2e)?7 + e), hence the other intersections are the roots of (1 — 2e)?7 2 — (1 — 2e)ij + e 

i.e. 


V 



1 -6e 
1 -2e 


i is 
( 1 - 
= 0, 


I.e. for e > g they do not exist (being complex (for e > g) or = \ (for e = g)); while 
for e < g they are 77 = 770,1 — 770 , where 


( 10 ) 


Vo 



1 -6e 
1 -2e 


6 + 3e 2 +,... 


For rj = 0; 77 * = e > rj. This, and the monotony and continuity of 77 * = f e {jj) therefore 
imply: 

First case, e>g: 0 < 77 <^ implies 77 < 77 * < ^; ^ < 77 < 1 implies \ < rj* < rj. 

Second case, e < ^ : 0 < 77 < 770 implies 77 < 77 * < 7705770 < 77 < \ implies 
770 < v* < Vi \ < ?? < 1 — ? 7 o implies 77 < 77 * <1 — 770 ; 1 — 770 < 77 < 1 implies 
1 — 770 < 77 * < 77 . 

Now we must expect numerous successive occurrences of the situation under con¬ 
sideration, if it is to be used as a basic procedure. Hence the iterative behavior of 
the operation 77 —)■ 77 * = f e (v) is relevant. Now it is clear from the above, that in the 
first case the successive iterates of the process in question always converge to no 
matter what the original 77 : while in the second case these iterates converge to 770 if 
the original 77 < ^, and to 1 — 770 if the original 77 > 

In other words: In the first case no error level other than 77 ~ ^ can maintain 
itself in the long run. I.e. the process asymptotically degenerates to total irrelevance, 
like the one discussed in 7.1 (p. 21). In the second case the error-levels 77 ~ 770 and 
77 ~ 1 — 770 will not only maintain themselves in the long run, but they represent the 
asymptotic behavior for any original 77 < ^ or 77 > respectively. 

These arguments although heuristic, make it clear that the second case alone can 
be used for the desired error-level control. I.e. we must require e < g, i.e. the error- 
level for a single basic organ function must be less than ~ 16%. The stable, ultimate 
error-level should then be 770 (we postulate, of course, that the start be made with an 
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error-level 77 < ^). 770 is small if e is, hence e must be small, and so 
(11) 770 = e + 3e + ... 

This would therefore give an ultimate error-level of ~ 10% (i.e. 77 ~ .1) for a single 
basic organ function error-level of ~ 8 % (i.e. e ~ .08). 


8.3.2 The rigorous argument. 

To make this heuristic argument binding, it would be necessary to construct an 
error controlling network V* for any given network V, so that all basic organs in it 
are so connected as to put them into the special case of a majority organ, as discussed 
above. This will not be uniformly possible, and it will therefore be necessary to modify 
the above heuristic argument, although its general pattern will be maintained. 

It is then desired to find, for any given network V, an essentially equivalent 
network V *, which is error-safe in some suitable sense, that conforms with the ideas 
expressed so far. We will define this as meaning, that for each output line of V * 
(corresponding to one of ( V ), the (separate) probability of an incorrect message (over 
this line) is < 771 . The value of 771 will result from the subsequent discussion. 

The construction will be an induction over the longest serial chain of basic organs 
in V, say /i = /i('P). 

Consider the structure of V. The number of its inputs i and outputs a is arbitrary, 
but every output of V must either come from a basic organ in V, or directly from an 
input, or from a ground or live source. Omit the first mentioned basic organs from V, 
as well as the outputs other than the first mentioned ones, and designate the network 
that is let over by Q. This is schematically shown in Figure 28 (p.26). (Some of the 
apparently separate outputs of Q may be split lines coming from a single one, but 
this is irrelevant for what follows.) 
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If Q is void, then there is nothing to prove; let therefore Q be non-void. Then 
clearly /u(Q) = fi(V) — 1. 

Hence the induction permits us to assume the existence of a network Q* which 
is essentially equivalent to Q, and has for each output a (separate) error-probability 

< m- 

We now provide three copies of Q* : Q * 1 , Q* 2 , Q* 3 , and construct V* as shown 
in Figure 29 (p.28). (Instead of drawing the, rather complicated, connections across 
the two dotted areas, they are indicated by attaching identical markings to endings 
that should be connected.) 

Now the (separate) output error-probabilities of Q* are (by inductive assumption) 

< rji. The majority organs in the bottom row in Figure 29 (p.28)* (those without 
a □) are so connected as to belong into the special case for a majority organ (cf. 
8.2 (p. 23)), hence their outputs have (separate) error-probabilities < / e (? 71 ). The 
majority organs in the top row in Figure 29 (p. 28) (those with a □) are in the 
general case, hence their (separate) error-probabilities are < e + 3 / e (? 7 i). 

Consequently the inductive step succeeds, and therefore the attempted inductive 
proof is binding if 

(12) e + 3/ e (?7i) < 771 . 


* ed: Figure 29 is shown “vertically”. The original text referred to columns instead 
of rows which assumed that the Figure was viewed as rotated by +90°. 
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8.4 Numerical evaluation. 

Substituting the expression (9) for f t (rj) into condition (12) gives 

4e + 3(1 — 2e)(3? 7 ^ — 2rj\) < 771 , 


i.e. 


3 3 2 1 2 e 

~ 2 ^ + “ 3 ( 1 ^) 


> 0 . 


Clearly the smallest 771 > 0 fulhlling this condition is wanted. Since the left hand side 
is < 0 for 771 > 0 , this means the smallest (real, and hence, by the above, positive) 
root of 


(13) 


3 3 2 1 

^ * 2* + 6(TT2i) 


m - 


2e 


3(1-2e) 


0 . 


We know from the preceding heuristic argument, that e < ^ will be necessary — but 
actually even more must be required. Indeed, for 771 = ^ the left hand side of (13) 
is = —(1 + e )/(6 — 12 e) < 0 , hence a significant and acceptable 771 (i.e. an r/ 1 < ^), 
can be obtained from (13) only if it has three real roots. A simple calculation shows, 
that for e = | only one real root exists rj 1 = 1.425. Hence the limiting e calls for 
the existence of a double root. Further calculation shows, that the double root in 
question occurs for e = .0073, and that its value is 771 = .060. Consequently e < .0073 
is the actual requirement, i.e. the error-level of a single basic organ function must be 
< .73%. The stable, ultimate error-level is then the smallest positive root r]i of (13). 
r]i is small if e is, hence e must be small, and so (from (13)) 

r/i = 4e + 152e 2 + ... 


It is easily seen, that e.g. an ultimate error level of 2% (i.e. 771 = .02, calls for a single 
basic organ function error-level of .41% (i.e. e = .0041). 

This result shows that errors can be controlled. But the method of construction 
used in the proof about threefolds the number of basic organs in V* for an increase 
of /i( V) by 1, hence V* has to contain about 3^^ such organs. Consequently the 
procedure is impractical. 

The restriction e < .0073 has no absolute significance. It could be relaxed by 
iterating the process of triplication at each step. The inequality e < g is essential, 
however, since our first argument showed, that for e > | even for a basic organ in the 
most favorable situation (namely in the “special” one) no interval of improvement 
exists. 
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9. THE TECHNIQUE OF MULTIPLEXING. 

9.1 General remarks on multiplexing. 

The general process of multiplexing in order to control error was already referred to 
in 7.4 (p. 22). The messages are carried on N lines. A positive number A(< is 
chosen and the stimulation of > (1 — A) IV lines of the bundle is interpreted as a 
positive message, the stimulation of < 5N lines as a negative message. Any other 
number of stimulated lines is interpreted as malfunction. The complete system must 
be organized in such a manner, that a malfunction of the whole automaton cannot 
be caused by the malfunctioning of a single component, or of a small number of 
components, but only by the malfunctioning of a large number of them. As we will see 
later, the probability of such occurrences can be made arbitrarily small provided the 
number of lines in each bundle is made sufficiently great. All of Section 9 (p. 30) will 
be devoted to a description of the method of constructing multiplexed automata and 
its discussion, without considering the possibility of error in the basic components. In 
Section 10 (p. 37) we will then introduce errors in the basic components, and estimate 
their effects. 

9.2 The majority organ. 

9.2.1 The basic executive organ. 

The first thing to consider is the method of constructing networks which will 
perform the tasks of the basic organs for bundles of inputs and outputs instead of 
single lines. 

A simple example will make the process clear: Consider the problem of construct¬ 
ing the analogue of the majority organ which will accommodate bundles of five lines. 
This is easily done using the ordinary majority organ of Figure 14 (p. 13), as shown 
in Figure 30 (p. 30). (The connections are replaced by suitable markings, in the same 
way as in Figure 29 (p. 28)). 
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9.2.2 The need for a restoring organ. 

It is intuitively clear that if almost all lines of the input bundles are stimulated, 
then almost all lines of the output bundle will be stimulated. Similarly if almost 
none of the lines of two of the input bundles are stimulated, then the mechanism will 
stimulate almost none of its output lines. However, another fact is brought to light. 
Suppose that a critical level A = 1/5 is set for the bundles. Then if two of the input 
bundles have 4 lines stimulated while the other has none, the output may have only 
3 lines stimulated. The same effect prevails in the negative case. If two bundles have 
just one input each stimulated, while the third bundle has all of its inputs stimulated, 
then the resulting output may be the stimulation of two lines. In other words, the 
relative number of lines in the bundle, which are not in the majority state, can double 
in passing through the generalized majority system. A more careful analysis (similar 
to the one that will be gone into in more detail for the case of the Scheffer organ in 
Section 10 (p. 37)) shows the following: If, in some situation, the operation of the 
organ should be governed by a two-to-one majority of the input bundles (i.e. if two 
of the bundles are both prevalently stimulated or both prevalently non-stimulated, 
while the third one is in the opposite condition), then the most probable level of 
the output error will be (approximately) the sum of the errors in the two governing 
input bundles; on the other hand in an operation in which the organ is governed 
by a unanimous behavior of its input bundles (i.e. if all three of the bundles are 
prevalently stimulated or all three are prevalently non-stimulated), then the output 
error will generally be smaller than the (maximum of the) input errors. Thus in the 
significant case of two-to-one majorization, two significant inputs may combine to 
produce a result lying in the intermediate region of uncertain information. What is 
needed therefore is a new type of organ which will restore the original stimulation 
level. In other words, we need a network having the property that, with a fairly high 
degree of probability, it transforms an input bundle with a stimulation level which 
is near to zero or to one into an output bundle with stimulation level which is even 
closer to the corresponding extreme. 

Thus the multiplexed systems must contain two types of organs. The first type is 
the executive organ which performs the desired basic operations on the bundles. The 
second type is an organ which restores the stimulation level of the bundles, and hence 
erases the degradation caused by the executive organs. This situation has its analog 
in many of the real automata which perform logically complicated tasks. For example 
in electrical circuits, some of the vacuum tubes perform executive functions, such as 
detection or rectification or gating or coincidence-sensing, while the remainder are 
assigned the task of amplification, which is a restorative operation. 


9.2.3 The restoring organ. 

9.2.3.1 Construction. 

The construction of a restoring organ is quite simple in principle, and in fact 
contained in the second remark made in 9.2.2 (p. 31). In a crude way, the ordinary 
majority organ already performs this task. Indeed in the simplest case, for a bundle 
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of three lines, the majority organ has precisely the right characteristics: It suppresses 
a single incoming impulse as well as a single incoming non-impulse, i.e. it amplifies 
the prevalence of the presence as well as of the absence of impulses. To display this 
trait most clearly, it suffices to split its output line into three lines, as shown in Figure 
31. (p. 32) 



Figv 31 


Now for large bundles, in the sense of the remark referred to above, concerning 
the reduction of errors in the case of a response induced by a unanimous behavior 
of the input bundles, it is possible to connect up majority organs in parallel and 
thereby produce the desired restoration. However, it is necessary to assume that the 
stimulated (or non-stimulated) lines are distributed at random in the bundle. This 
randomness must then be maintained at all times. The principle is illustrated by 
Figure 32 (p.33). The “black box” U is supposed to permute the lines of the input 
bundle that pass through it, so as to restore the randomness of the pulses in its lines. 
This is necessary, since to the left of U the input bundle consists of a set of triads, 
where the lines of each triad originate in the splitting of a single line, and hence are 
always all three in the same condition. Yet, to the right of U the lines of the corre¬ 
sponding triad must be statistically independent, in order to permit the application 
of the statistical formula to be given below for the functioning of the majority organ 
into which they feed. The way to select such a “randomizing” permutation will not 
be considered here — it is intuitively plausible that most “complicated” permutations 
will be suited for this “randomizing” role. (cf. 11.2 (p. 48)) 

9.2.3.2 Numerical evaluation. 

If aAf of the A f incoming lines are stimulated, then the probability of any majority 
organ being stimulated (by two or three stimulated inputs) is 

(14) a* = 3 a 2 — 2 a 3 = g(a). 

Thus approximately (i.e. with high probability, provided Af is large) ~ a*Af outputs 
will be excited. Plotting the curve of ct* against a, as shown in Figure 33 (p.33), 
indicates clearly that this organ will have the desired characteristics: 

This curve intersects the diagonal a* = a three times: For ck = 0, 1; 0 < a < ^ 

implies 0<ai*<a:;^<a:<l implies a < a*, 1. I.e. successive iterates of this 
process converge to 0 if the original a < ^ and to 1 if the original a > 

In other words: The error levels a ~ 0 and a ~ 1 will not only maintain them¬ 
selves in the long run but they represent the asymptotic behavior for any original 
a < or a > ^ respectively. Note, that because of g{l — a) = 1 — g(a) there is 
complete symmetry between the a < ^ region and the a > \ region. 
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The process a->a’ thus brings every a nearer to that one of 0 and 1, to which it 
was nearer originally. This is precisely that process of restoration, which was seen in 
9.2.2 (p. 31) to be necessary. I.e. one or more (successive) applications of this process 
will have the required restoring effect. 

Note that this process of restoration is most effective when a — a* = 2a 3 —3a 2 ±a 
has its minimum or maximum, i.e. for 6a 2 — 6a + 1 = 0, i.e. for a = (3 ± a/5)/ 6 = 
.788, .212. Then a — a* = ±.096. I.e. the maximum restoration is effected on error 
levels at the distance of 21.2% from 0% or 100% — these are improved (brought 
nearer) by 9.6%. 



& 


Fig. 13 



9.3 Other basic organs. 

We have so far assumed that the basic components of the construction are ma¬ 
jority organs. From these, an analogue of the majority organ — one which picked out 
a majority of bundles instead of a majority of single lines — was constructed. Since 
this, when viewed as a basic organ, is a universal organ, these considerations show 
that it is at least theoretically possible to construct any network with bundles instead 
of single lines. However there was no necessity for starting from majority organs. 
Indeed, any other basic system whose universality was established in Section 4 can 
be used instead. The simplest procedure in such a case is to construct an (essential) 
equivalent of the (single line) majority organ from the given basic system (cf. 4.2.2 
(p. 11)), and then proceed with this composite majority organ in the same way as 
was done above with the basic majority organ. 

Thus, if the basic organs are those Nos. one and two in Figure 10 (p. 10) (cf. 
the relevant discussion in 4.1.2 (p. 9), then the basic synthesis (that of the majority 
organ, cf. above) is immediately derivable from the introductory formula of Figure 14 
(P-13). 
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9.4 The Scheffer stroke. 

9.4.1 The executive organ. 


Similarly, it is possible to construct the entire,mechanism starting from the Schef¬ 
fer organ of Figure 12 (p. 12). In this case, however, it is simpler not to effect the 
passage to an (essential) equivalent of the majority organ (as suggested above), but 
to start de novo. Actually, the same procedure, which was seen above to work for 
the majority organ, works mutatis mutandis for the Scheffer organ, too. A brief 
description of the direct procedure in this case is given in what follows: 

Again, one begins by constructing a network which will perform the task of the 
Scheffer organ for bundles of inputs and outputs instead of single lines. This is shown 
in Figure 34 (p. 34) for bundles of five wires. (The connections are replaced by suitable 
markings, as in Figures 29 (p. 28) and 30 (p. 30).) 

It is intuitively clear that if almost all lines of both input bundles are stimulated, 
then almost none of the lines of the output bundle will be stimulated. Similarly, if 
almost none of the lines of one input bundle are stimulated, then almost all lines of 
the output bundle will be stimulated. In addition to this overall behavior, the follow¬ 
ing detailed behavior is found (cf. the detailed consideration in 10.4 (p. 42)). If the 
condition of the organ is one of prevalent non-stimulation of the output bundle, and 
hence is governed by (prevalent stimulation of) both input bundles, then the most 
probable level of the output error will be (approximately) the sum of the errors in 
the two governing input bundles; if on the other hand the condition of the organ is 
one of prevalent stimulation of the output bundle, and hence is governed by (preva¬ 
lent non-stimulation of) one or of both input bundles, then the output error will be 
on (approximately) the same level as the input error, if (only) one input bundle is 
governing (i.e. prevalently non-stimulated), and it will be generally smaller than the 
input error, if both input bundles were governing (i.e. prevalently non-stimulated). 
Thus two significant inputs may produce a result lying in the intermediate zone of 
uncertain information. Hence a restoring organ (for the error level) is again needed, 
in addition to the executive organ. 
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9.4.2 The restoring organ. 

Again the above indicates that the restoring organ can be obtained from a special 
case functioning of the standard executive organ, namely by obtaining all inputs from 
a single input bundle, and seeing to it that the output bundle has the same size as the 
original input bundle. The principle is illustrated by Figure 35 (p. 36). “The black 
box” U is again supposed to effect a suitable permutation of the lines that pass through 
it, for the same reasons and in the same manner as in the corresponding situation for 
the majority organ (cf. Figure 32 (p. 33)). I.e. it must have a “randomizing” effect. 

If aJ\f of the J\f incoming lines are stimulated, then the probability of any Scheffer 
organ being stimulated (by at least one non-stimulated input) is 

(15) a + = 1 — a 2 = h(a). 

Thus approximately (i.e. with high probability provided A f is large) ~ a + Af outputs 
will be excited. Plotting the curve, of a + against a discloses some characteristic 
differences against the previous case (that one of the majority organs, i.e. a* = 
3 a 2 — 2 a 3 = g(a) : cf. 9.2.3 (p. 31)), which require further discussion. This curve is 
shown in Figure 36 (p. 36). Clearly a + is an antimonotone function of a, i.e. instead 
of restoring an excitation level (i.e. bringing it closer to 0 or to 1, respectively), it 
transforms it into its opposite (i.e. it brings the neighborhood of 0 close to 1, and the 
neighborhood of 1 close to 0). In addition it produces for a near to 1 an a + less near 
to 0 (about twice farther), but for a near to 0 a + much nearer to 1 (second order!). 
All these circumstances suggest that the operation should be iterated. 

Let the restoring organ therefore consist of two of the previously pictured organs in 
series, as shown in Figure 37 (p. 36). (The “black boxes” U.i,U .2 play the same role as 
their analog IA plays in Figure 35 (p. 36).) This organ transforms an input excitation 
level aAf into an output excitation level of approximately (cf. above) ~ a ++ where 

a ++ = 1 — (1 — a 2 ) 2 = h(h(a )) = k(a), 


i.e. 


ct ++ = 2a 2 — a 4 = k(a), 


(16) 
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of** 



This curve of a ++ against a is shown in Figure 38 (p. 37). This curve is very 
similar to that one obtained for the majority organ (i.e. a* = 3 or — 2a 3 = g(a), 
(cf. 9.2.3 (p. 31)). Indeed: The curve intersects the diagonal ct ++ = a in the interval 
0 < ct ++ < 1 three times: For a = 0, «o,A where «o = (—1 + v / 5)/2 = .618. (There 
is a fourth intersection a = — 1 — «o = —1-618, but this is irrelevant, since it is not 
in the interval 0 < a < 1.) 0 < a < cto implies 0 < ck ++ < a; ao < a < 1 implies 
a < a ++ < 1 . 

In other words: The role of the error levels a ~ 0 and a ~ 1 is precisely the 
same as for the majority organ (cf. 9.2.3 (p. 31)), except that the limit between their 
respective areas of control lies at a = cio instead of at a = I.e. the process a —> a ++ 
brings every a nearer to either 0 or to 1, but the preference to 0 or to 1, is settled at a 
discrimination level of 61.8% (i.e. «o) instead of one of 50% (i.e. ^). Thus, apart from 
a certain asymmetric distortion, the organ behaves like its counterpart considered for 
the majority organ — i.e. is an effective restoring mechanism. 


10. ERROR IN MULTIPLEX SYSTEMS. 

10.1 General remarks. 

In Section 9 (p. 30) the technique for constructing multiplexed automata was 
described. However the role of errors entered at best intuitively and summarily, and 
therefore it has still not been proved that these systems will do what is claimed for 
them — namely control error. Section 10 (p. 37) is devoted to a sketch of the statistical 
analysis necessary to show that, by using large enough bundles of lines, any desired 
degree of accuracy (i.e. as small a probability of malfunction of the ultimate output 
of the network as desired) can be obtained with a multiplexed automaton. 

For simplicity, we will only consider automata which are constructed from the 
Scheffer organs. These are easier to analyze since they involve only two inputs. At 
the same time, the Scheffer organ is (by itself) universal (cf. 4.2.1 (p. 11)), hence every 
automaton is essentially equivalent to a network of Scheffer organs. 

Errors in the operation of an automaton arise from two sources. First, the indi¬ 
vidual basic organs can make mistakes. It will be assumed as before, that, under any 
circumstance, the probability of this happening is just e. Any operation on the bundle 
can be considered as a random sampling of size A f (AT being the size of the bundle). 
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The number of errors committed by the individual basic organs in any operation on 
the bundle is then a random variable, distributed approximately normally with mean 
eAf and standard deviation \J e(l — e)Af. A second source of failures arises because 
in operating with bundles which are not all in the same state of stimulation or non¬ 
stimulation, the possibility of multiplying error by unfortunate combinations of lines 
into the basic (single line) organs is always present. This interacts with the statisti¬ 
cal effects, and in particular with the processes of degeneration and of restoration of 
which we spoke in 9.2.2 (p.31), 9.2.3 (p.31) and 9.4.2 (p.34). 

10.2 The distribution of the response set size. 

10.2.1 Exact theory. 

In order to give a statistical treatment of the problem, consider the Figure 34 
(p.34), showing a network of Scheffer organs, which was discussed in 9.4.1. (p.34) 
Let again Af be the number of lines in each (input or output) bundle. Let X be the 
set of those i = 1 ,..., Af for which line No. i in the first input bundle is stimulated at 
time t ; let y be the corresponding set for the second input bundle at time t; and let 
Z be the corresponding set for the output bundle, assuming the correct functioning 
of all the Scheffer organs involved, at time t + 1. Let X, y have £Af,r)Af elements, 
respectively, but otherwise be random — i.e. equidistributed over all pairs of sets with 
these numbers of elements. What can then be said about the number of elements (fAf 
of Z7 Clearly £, 77 , (f are the relative levels of excitation of the two input bundles and 
of the output bundle, respectively, of the network under consideration. The question 
is then: what is the distribution of the (stochastic) variable (f in terms of the (given) 

Let W be the complimentary set of Z. Let p, q , r be the numbers of elements of 
A, y , W, respectively, so that p = £A/", q = rjAf, r = (1 — Cf)Af. Then the problem 
is to determine the distribution of the (stochastic) variable r in terms of the (given) 
p,q — i.e. the probability of any given r in combination with any given p, q. 

W is clearly the intersection of the sets X, y : W = X ■ y. Let U , V be the (rel¬ 
ative) complements of W in X, y respectively: U = X — W, V = y — W, and let 
S be the (absolute, i.e. in the set (1,...,A/")) complement of the sum of X and 
y : S = — (X + 3^)- Then W,W, V, S are pairwise disjoint sets making up to¬ 
gether precisely the entire set (1,..., Af), with r, p — r, q — r, Af — p—q + r ele¬ 
ments, respectively. Apart from this they are unrestricted. Thus they offer together 
A f\/[r\(p — r)\(q — r)\(Af — p — q + r)!] possible choices. Since there are a priori 
Af\/\p\(Af — p)\] possible choices of an X with p elements and a priori Af\/[q\(Af — q)\] 
possible choices of a y with q elements, this means that the looked for probability of 
W having r elements is 


/_A r\ _ / ATI _ Af\ \ 

\r\(p — r)\ (q — r)\ (Af p — q + r)\/ p\ (Af — p)\ q\ (Af — q)\) 
p\ (Af — p)\ q\ (Af — q)! 
r\ (p — r)\ (q — r)\ (Af — p — q + r)! A f\ 
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Note that this formula also shows that g = 0 when r<0orp — r<0orq — r<0 
or A/" — p — q + r < 0, i.e. when r violates the conditions 

max( 0 , p + q — A f) < r < min(p, q). 

This is clear combinatorially, in view of the meaning of X. y and W. In terms of 
£, rj, £ the above conditions become 

(17) 1 — max(0, £ + 77 — 1)>£>1 — min(£, rj). 


Returning to the expression for o. substituting the £, 77 , £ expressions for p,q,r 
and using Stirling’s formula for the factorials involved, gives 


(18) 


\/2jrA7 


a e 


-eu 


where 

= _ £(1 — 0^(1 — v) _ 

“ (C + £ - !)(C + ^7 - 1)(1 - 0(2 - £ - ?7 - 0 ’ 

d =(£ + £-l)ln(£+ £-l) + (£ +77-l)ln(£ +77-1) 

+ (! - 0 ln (! - 0 + ( 2 - £ - V ~ 0 ln ( 2 - £ « V ~ 0 
-£ ln£ - (1 - £) ln(l — £) — 77 In77 — (1 — 77) ln(l - 77). 

From this 

dd (C + £-l)(C + ? 7 -l) 

9£ 11 (1 - 0(2 - £ - ?7 - O ’ 

d 2 e _ 1 1 1 1 

90 ~~ C + £ -1 + C + V - 1 + I^C + 2 — £ — 77 — 0 


0/1 o2 /} 

Hence d = 0, ^ = 0 for £ = 1 — £ 77 , and ^ > 0 for all £ (in its entire interval 
of variability according to (17)). Consequently 6 > 0 for all £ y 1 — £77 (within the 
interval (17)). This implies, in view of (18), that for all £ which are significantly 
7 ^ 1 — £ 77 , g tends to 0 very rapidly as A f gets large. It suffices therefore to evaluate 
(18) for £ ~ 1 — £ 77 . Now a = 1/[£(1 - £)??(1 - 77 )], gjf = 1/[£(1 - £> 7(1 - 77 )] for 
£ = 1 — £ 77 . Hence 

1 

a ~ £(1 -00 1 - O ’ 


(£ - (1 - £0) 2 

2£(1 - £) 7 7 (1 - 77 ) 


for £ r^j 1 — £ 77 . Therefore 


(19) 


1 _ <-(l-^)) 2 Kf 

- - P 2{(l-€)i7(l -vY 

0 2 7t£(1 -£)7?(1 -77)A f 


is an acceptable approximation for 0 . 
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r is an integer-valued variable, hence £ = 1 — jj- is a rational-valued variable, with 
the fixed denominator A f. Since J\f is assumed to be very large, the range of Q is very 
dense. It is therefore permissible to replace it by a continuous one, and to describe the 
distribution of £ by a probability-density a. g is the probability of a single value of 
(, and since the values of ( are equidistant, with a separation d( = 1 /A f, the relation 
between a and g is best defined by crd( = g, i.e. a = gj\f. Therefore ((19) becomes 

i i ( C-a-tv) \ 

(20) a - p 2 W€(i-CMi- j)/W . 

V^v 7 ^ 1 - Ovi 1 - v)/N 

This formula means obviously the following: 

Q is approximately normally distributed, with the mean 1 — ^rj and the disper¬ 
sion a/^(1 — £) ? ?(1 — 77 )/A/". Note, that the rapid decrease of the normal distribu¬ 
tion function (i.e. the right hand side of (20) with J\T (which is exponential!) is 
valid as long as Q is near to 1 — £ 77 , only the coefficient of A f (in the exponent, i.e. 

4([C - (1 - is somewhat altered as Q deviates from 

1 — £r). (This follows from the discussion of 9 given above.) 

The simple statistical discussion of 9.4 (p. 34) amounted to attributing to £ the 
unique value 1 — £ 77 . We see now that this is approximately true: 


( 21 ) 


C = (1 - {>?) + V?)! - ?Mi - n)!N <s, 

5 is a stochastic variable, normally distributed, with mean 0 
and dispersion 1. 


10.2.2 Theory with errors. 

We must now pass from r, £, which postulate faultless functioning of all Scheffer 
organs in the network, to r', (/, which correspond to the actual functioning of all these 
organs — i.e. to a probability e of error on each functioning. Among the r organs each 
of which should correctly stimulate its output, each error reduces r' by one unit. The 
number of errors here is approximately normally distributed, with the mean er and 
the dispersion a J e(l — e)r (cf. the remark made in 10.1 (p.37)). Among the J\f — r 
organs, each of which should correctly not stimulate its output, each error increases r' 
by one unit. The number of errors here is again approximately normally distributed, 
with the mean e( N — r) and the dispersion -^(l — e)(A7 — r) (cf. as above). Thus 
r' — r is the difference of these two (independent) stochastic variables. Hence it, too, 
is approximately normally distributed, with the mean —er + e(A f — r) = e(A f — 2r), 
and the dispersion 

\J(ye{l-e)r} + (\/e(l - e)(N - = \/e(l - e)Af. 

I.e. (approximately) 

r = r + - r j + y/e(l - e)J\f 6 ', 
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where S' is normally distributed with the mean 0 and the dispersion 1. From this 


C 7 — C + 2e^- — c) + V e (l — e )N S', 

and then by ( 21 ) 

C / =(l-^) + 2e^r7-^) 

+ (1 - 2e)y/Z > (l-£)ri(l-rj)/N S 
- \J e(l - e)/J\f S'. 

Clearly ( 1 — 2e)-\/£(l — £)p(l — rj)/J\f 5—^e{ 1 — e)/Af S', too, is normally distributed, 
with the mean 0 and the dispersion 


\j ((1 - i)#"- SMI - v) w) 2 + (xA(i - tW)' 
=\j ((1 - 2e) 2 £(l - f)ij(l - v) + e(l - e))/ M. 

Hence (21) becomes at last (we write again ( in place of ('): 

' C = (1 ~ fr?) + MtV ~ \) _ 

+y ((1 - 2 f )2«l - 0,(1 - 17 ))«S*, 


( 22 ) 


is a stochastic variable, normally distributed, with 
the mean 0 and the dispersion 1 . 


10.3 The restoring organ. 

This discussion equally covers the situations that are dealt with in Figures 35 
(p. 36) and 37 (p. 36), showing networks of Scheffer organs in 9.4.2 (p. 34). 

Consider first Figure 35 (p. 36). We have here a single input bundle of J\f lines, 
and an output bundle of A f lines. However, the two-way split and the subsequent 
“randomizing” permutation produce an input bundle of 2J\f lines and (to the right of 
U) the even lines of this bundle on one hand, and its odd lines on the other hand, may 
be viewed as two input bundles of A f lines each. Beyond this point the network is the 
same as that one of Figure 34 (p. 34), discussed in 9.4.1 (p. 34). If the original input 
bundle had £A f stimulated lines, then each one of the two derived input bundles 
will also have £A/" stimulated lines. (To be sure of this, it is necessary to choose 
the “randomizing” permutation U of Figure 35 (p. 36) in such a manner, that it 
permutes the even lines among each other, and the odd lines among each other. This 
is compatible with its “randomizing” the relationship of the family of all even lines 
to the family of all odd lines. Hence it is reasonable to expect that this requirement 
does not conflict with the desired randomizing character of the permutation.) Let the 
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output bundle have (Af stimulated lines. Then we are clearly dealing with the same 
case as in ( 22 ), except that it is specialized to £ = 77 . 

Hence (22) becomes: 

'C= (1 - e) + 2e(y - 1) _ 

( + y((l-2e)2K(l-6 ) 2 + £(l-f))/V <5* 

5* is a stochastic variable, normally distributed, with the mean 0 
and the dispersion 1 . 

Consider next Figure 37 (p. 36). Three bundles are relevant here: The input 
bundle at the extreme left, the intermediate bundle issuing directly from the first 
tier of Scheffer organs, and the output bundle issuing directly from the second tier 
of Scheffer organs, i.e. at the extreme right. Each one of these three bundles consists 
of A f lines. Let the number of stimulated lines in each bundle be QAf. ojA f, TA f, 
respectively. Then (23) above applies, with its £, C replaced first by (, u ;, and second 
by oj, T : 


(1- C 2 ) + 2€(C 2 -1) _ 

+ \j ((1 ~ 2e) 2 (C(l — C)) 2 + e (l ^ e ))/V 6“, 

(1 — oj 2 ) + 2 e(u 1 2 — 5 ) 

+ J ^(1 — 2e) 2 (co(l — a;)) 2 + e(l — e)^ JAT 5***, 

are stochastic variables, independently and normally 
distributed, with the mean 0 and the dispersion 1 . 

10.4 Qualitative evaluation of the results. 

In what follows, (22) and (24) will be relevant — i.e: the Scheffer organ networks 
of Figures 34 (p. 34) and 37 (p. 36). 

Before going into these considerations however, we have to make an observation 
concerning ( 22 ). ( 22 ) shows that the (relative) excitation levels £,77 on the input 
bundles of its network generate approximately (i.e. for large A f and small e) the 
(relative) excitation level Co = 1 — C 7 ? 011 the output bundle of that network. This 
justifies the statements made in 9.4.1 (p. 34) about the detailed functioning of the 
network. Indeed: if the two input bundles are both prevalently stimulated, i.e. if 
£ ~ 1,77 ~ 1 then the distance of Co from 0 is about the sum of the distances of 
£ and of 77 from 1 : Co = (1 — £) + £(1 — 77). If one of the two input bundles, 
say the first one, is prevalently non-stimulated, while the other one is prevalently 
stimulated, i.e. if £ ~ 0,77 ~ 1 , then the distance of £o from 1 is about the distance 
of £ from 0 : 1 — Co = C 7 ?- If both input bundles are prevalently non-stimulated, i.e. 
if £ ~ 0, 77 ~ 0, then the distance of Co from 1 is small compared to the distances of 
both £ and 77 from 0 : 1 — Co = C 7 ?- 


(24) ^ 


UJ = 


T = 


h**,h* 
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10.5 Complete quantitative theory. 

10.5.1 General results. 

We can now pass to the complete statistical analysis of the Scheffer stroke oper¬ 
ation on bundles. In order to do this, we must agree on a systematic way to handle 
this operation by a network. The system to be adopted will be the following: the 
necessary executive organ will be followed in series by a restoring organ. I.e. the 
Scheffer organ network of Figure 34 (p. 34) will be followed in series by the Scheffer 
organ network of Figure 37 (p.36). This means that the formulas of (22) are to be 
followed by those of (24). Thus £, 77 are the excitation levels of the two input bundles, 
T is the excitation level of the output bundle, and we have: 

' C = ( 1 - £ 77 ) + HCv ~ \) _ 

+ \J ((1 - 2 e) 2 £(l - 0 * 7(1 - 77 ) + e(l - e))N 6*, 

U = (1 - c 2 ) + 2 e(C 2 - \) _ 

/ a r < 5 **, 


+ J ((1 — 2e) 2 (u;(l — a ;)) 2 + e(l — e)j jJ\f 5***, 

5*, $** j( 5 *** are stochastic variables, independently and normally 
_ distributed, with the mean 0 and the dispersion 1 . 

Consider now a given fiduciary level A. Then we need a behavior, like the “correct” 
one of the Scheffer stroke, with an overwhelming probability. This means: the im¬ 
plication of T < A by £ > 1 — A,? 7 > 1 — A; the implication of T > 1 - A by 
£ < A, 77 > 1 — A; the implication of T > 1 — A by £ < A, 77 < A. (We are, of course, 
using the symmetry in £, 77 .) 

This may, of course, only be expected for J\f sufficiently large and e sufficiently 
small. In addition, it will be necessary to make an appropriate choice of the fiduciary 
level A. 

If Af is so large and e is so small that all terms in (25) containing factors 1 /a/A 7 
and e can be neglected, then the above desired “overwhelmingly probable inferences” 
become even strictly true, if A is small enough. Indeed, then (25) gives C = Co = 
1 - £ 77 , u = ojo = 1 - C 2 , $ = ^0 = 1 - w 2 , i.e. T = 1 — ( 2£?7 - (£? 7 ) 2 ) 2 . Now 
it is easy to verify T = 0(A 2 ) for£ > 1 — A, 77 > 1 — A; T = 1 — 0(A 2 ) for 
£<A, 77 > 1 — A;T = 1 — 0(A 4 ) for £ < A, 77 < A. Hence sufficiently small A will 
guarantee the desiderata stated further above. 

10.5.2 Numerical evaluation. 

Consider next the case of a fixed, finite A f and a fixed, positive e. Then a more 


(25) 




+ ((1 — 2e) 2 (£(l — C )) 2 + e(l — e) j 

= (1 - a; 2 ) + 2 e(a ; 2 - \) 
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elaborate calculation must be based on the complete formulae of (25). This calculation 
will not be carried out here, but its results will be described. 

The most favorable fiduciary level A, from the point of view of this calculation 
turns out to be A = .07. I.e. stimulation of at least 93% of the lines of a bundle 
represents a positive message; stimulation of at most 7% of the lines of a bundle rep¬ 
resents a negative message; the interval between 7% and 93% is a zone of uncertainty, 
indicating an effective malfunction of the network. 

Having established this fiduciary level, there exists also an upper bound for the 
allowable values of e. This is e = .0107. In other words if e > .0107, the risk of 
effective malfunction of the network will be above a fixed, positive lower bound, no 
matter how large a bundle size J\f is used. The calculations were therefore continued 
with a specific e < .0107, namely, with e = .005. 

With these assumptions, then, the calculation yields an estimate for the probabil¬ 
ity of malfunction of the network, i.e. of the violation of the desiderata stated further 
above. As is to be expected, this estimate is given by an error integral. This is 


(26) 


qW 

where 

K 


\/27r 


f 

J K 


; 2 1 dx 


1 „2 


(y/2n)i 


-e 2 


= . 062 VA/ 


expresses, in a certain sense, the total allowable error divided by a composite standard 
deviation. The approximation is of course valid only for large A f. It can also be 
written in the form 

6.4 8.67V 

(27) g( M)^—= 10 10 ’ 000 . 

y/M 


The following table gives a better idea of the dependency expressed by the formula:* 


A f = number of lines in a bundle p(A7) = probability of malfunction 


1,000 

2.7 x 10" 2 

2,000 

2.6 x 10" 3 

3,000 

2.5 x 10" 4 

5,000 

4.0 x 10" 6 

10,000 

1.6 x 10" 10 

20,000 

2.8 x 10- 19 

25,000 

1.2 x 10" 23 


Notice that for as many as 1000 lines in a bundle, the reliability (about 3%) is rather 
poor. (Indeed it is inferior to the e = .005, i.e., 1/2%, that we started with.) However, 
a 25 fold increase in this size gives a very good reliability. 


* ed: Neither the final (~) term in equation (26) nor equation (27) yield exactly the 
results shown in column 2 of the Table. However, both are quite close. It is possible 
that von Neumann did the calculation in his head, but even then it would have been 
expected that he got exactly correct answers using either of the formulas. 
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10.5.3 Examples. 


10.5.3.1 First example. 

To get an idea of the significance of these sizes and the corresponding approxi¬ 
mations, consider the two following examples. 

Consider first a computing machine with 2500 vacuum tubes each of which is 
actuated on the average once every 5 microseconds. Assume that a mean free path 
of 8 hours between errors is desired. In this period of time there will have been 
^ x 2, 500 x 8 x 3, 600 x 10 6 = 1.4 x 10 13 actuations, hence the above specification 
calls for 5 ~ 1/[1.4 x 10 13 ] = 7 x 10 -14 . According to the above table this calls for an 
J\f between 10,000 and 20,000 — interpolating linearly on — 10 log 5 gives A f = 14, 000. 
I.e. the system should be multiplexed 14,000 times. 

It is characteristic for the steepness of statistical curves in this domain of large 
numbers of events, that a 25 percent increase of A/", i.e. AT = 17, 500, gives (again by 
interpolation) 5 = 4.5 x 10 -17 , i.e. a reliability which is 1,600 times better. 

10.5.3.2 Second example. 

Consider second a plausible quantitative picture for the functioning of the human 
nervous system. The number of neurons involved is usually given as 10 10 , but this 
number may be somewhat low, also the synaptic end-bulbs and other possible au¬ 
tonomous sub-units may increase it significantly, perhaps a few hundred times. Let 
us therefore use the figure 10 13 for the number of basic organs that are present. A 
neuron may be actuated up to 200 times per second, but this is an abnormally high 
rate of actuation. The average neuron will probably be actuated a good deal less fre¬ 
quently, in the absence of better information 10 actuations per second may be taken 
as an average figure of at least the right order. It is hard to tell what the mean free 
path between errors should be. Let us take the view that errors properly defined are 
to be quite serious errors, and since they are not ordinarily observed, let us take a 
mean free path which is long compared to an ordinary human life — say 10,000 years. 
This means 10 13 x 10, 000 x 31, 536, 000 x 10 = 3.2 x 10 25 actuations, hence it calls for 
5 ~ 1/(3.2 x 10 25 ) = 3.2 x 10 -26 . According to the table this lies somewhat beyond 
Af = 25, 000 — extrapolating linearly on — 10 log 5 gives Af = 28, 000. 

Note, that if this interpretation of the functioning of the human nervous system 
were a valid one (for this cf. the remark of Section 11.1 (p. 47)), the number of basic 
organs involved would have to be reduced by a factor 28,000. This reduces the number 
of relevant actuations and increases the value of the necessary 5 by the same factor. 
I.e. 5 = 9 x 10 -22 , and hence A f = 23,000. The reduction of A f is remarkably 
small — only 20%. This makes a reevaluation of the reduced Af with the new Af, 6 
unnecessary: In fact the new factor, i.e. 23,000, gives 5 = 7.4 x 10 -22 and this with 
the approximation used above, again Af = 23, 000. (Actually the change of Af is 
~ 120, i.e. only l/2%!) 

Replacing the 10,000 years, used above rather arbitrarily, by 6 months, introduces 
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another factor 20 , 000 , and therefore a change of about the same size as the above 
one — now the value is easily seen to be J\f = 23, 000 (uncorrected) or J\f = 19, 000 
(corrected). 

10.6 Conclusions. 

All this shows that the order of magnitude of J\T is remarkably insensitive to 
variations in the requirements, as long as these requirements are rather exacting ones, 
but not wholly outside the range of our (industrial or natural) experience. Indeed, 
the J\f obtained above were all ~ 20, 000, to within variations lying between -30% 
and +40%. 

10.7 The general scheme of multiplexing. 

This is an opportune place to summarize our results concerning multiplexing, i.e. 
the Sections 9 (p. 30) and 10 (p. 37). Suppose it is desired to build a machine to per¬ 
form the logical function f(x, y ,...) with a given accuracy (probability of malfunction 
on the final result of the entire operation) 77 , using Scheffer neurons whose reliability 
(or accuracy), i.e. probability of malfunction on a single operation) is e. We assume 
e = .005. The procedure is then as follows. 

First, design a network 1Z for this function f(x, y ,...) as though the basic (Schef¬ 
fer) organs had perfect accuracy. Second, estimate the maximum number of sin¬ 
gle (perfect) Scheffer organ reactions (summed over all successive operations of all 
the Scheffer organs actually involved) that occur in the network 1Z in evaluating 
f(x, y, ■ ■ ■) — say m such reactions. Put 8 — rj/m. Third, estimate the bundle size A f 
that is needed to give the multiplexed Scheffer organ-like network (cf. 10.5.2 (p. 43)) 
an error probability of at most 5. Fourth, replace each single line of the network 1Z 
by a bundle of size A/", and each Scheffer neuron of the network 1Z by the multiplexed 
Scheffer organ network that goes with this J\f (cf. 10.5.1 (p. 43)) — this gives a net¬ 
work qW. a “yes” will then be transmitted by the stimulation of more than 93% of 
the strands in a bundle, a “no” by the stimulation of less than 7%, and intermediate 
values will signify the occurrence of an essential malfunction of the total system. 

It should be noticed that this construction multiplies the number of lines by A f 
and the number of basic organs by 3J\f. (In 10.5.3 (p. 45) we used a uniform factor of 
multiplication A f . In view of the insensitivity of A f to moderate changes in 8 that we 
observed in 10.5.3.2 (p. 45), this difference is irrelevant.) Our above considerations 
show that the size of J\f is ~ 20, 000 in all cases that interest us immediately. This 
implies that such techniques are impractical for present technologies of componentry 
(although this may perhaps not be true for certain conceivable technologies of the 
future), but they are not necessarily unreasonable (at least not on grounds of size 
alone) for the micro-componentry of the human nervous system. 

Note that the conditions are significantly less favorable for the non-multiplexing 
procedure to control error described in Section 8 (p. 23). That process multiplied the 
number of basic organs by 3^, /j being the number of consecutive steps (i.e. basic 
organ actuations) from input to output, (cf. the end of 8.4 (p. 29)). (In this way 
of counting, iterative processes must be counted as many times as iterations occur.) 
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Thus for n = 160, which is not an excessive “logical depth,” even for a conventional 
calculation, 3 160 ~ 2 x 10 76 , i.e. somewhat above the putative order of the number of 
electrons in the universe. For /j = 200 (only 25 percent more!) then 3 200 ~ 2.5 x 10 95 , 
i.e. 1.2 x 10 19 times more — in view of the above this requires no comment. 

11. GENERAL COMMENTS ON DIGITALIZATION 
AND MULTIPLEXING. 

11.1 Plausibility of various assumptions regarding the digital 
vs. analog character of the nervous system. 

We now pass to some remarks of a more general character. 

The question of the number of basic neurons required to build a multiplexed 
automaton serves as an introduction for the first remark. The above discussion shows 
that the multiplexing technique is impractical on the level of present technology, 
but quite practical for a perfectly conceivable more advanced technology and for the 
natural relay organs (neurons). I.e. it merely calls for a, not at all unnatural, micro¬ 
componentry. It is therefore quite reasonable to ask specifically, whether it, or some 
thing more or less like it, is a feature of the actually existing human (or rather animal) 
nervous system. 

The answer is not clear cut. The main trouble with the multiplexing systems, 
as described in the preceding Section (p. 37), is that they follow too slavishly a fixed 
plan of construction — and specifically one, that is inspired by the conventional pro¬ 
cedures of mathematics and mathematical logics. It is true that the animal nervous 
systems, too, obey some rigid “architectural” patterns in their large-scale construc¬ 
tion, and that those varieties, which make one suspect a merely statistical design 
seem to occur only in finer detail and on the micro-level. (It is characteristic of this 
duality that most investigators believe in the existence of overall laws of large-scale 
nerve-stimulation and composite action that have only a statistical character, and yet 
occasionally a single neuron is known to control a whole reflex-arc.) It is true, that 
our multiplexing scheme, too, is rigid only in its large-scale pattern (the prototype 
network 77, as a pattern, and the general layout of the executive-plus-restoring organ, 
as discussed in 10.7 (p. 46) and in 10.5.1 (p.43), while the “random” permutation 
“black boxes” (cf. the relevant Figures 32 (p. 33), 35 (p. 36), 37 (p. 36) in 9.2.3 (p. 31) 
and 9.4.2 (p. 34)) are typical of a “merely statistical design.” Yet the nervous system 
seems to be somewhat more flexibly designed. Also, its “digital” (neural) operations 
are rather freely alternating with “analog” (humoral) processes in their complete 
chains of causation. Finally the whole logical pattern of the nervous system seems to 
deviate in certain important traits qualitatively and significantly from our ordinary 
mathematical and mathematical-logical modes of operation. The pulse-trains that 
carry “quantitative” messages along the nerve fibers do not seem to be coded digital 
expressions (like a binary or a (Morse or binary coded) decimal digitalization) of a 
number, but rather “analog” expressions of one, by way of their pulse-density, or 
something similar — although much more than ordinary care should be exercised in 
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passing judgments in this field, where we have so little factual information. Also, the 
“logical depth” of our neural operations — i.e. the total number of basic operations 
from (sensual) input to (memory) storage or (motor) output seems to be much less 
than it would be in any human automaton (e.g. a computing machine) dealing with 
problems of anywhere nearly comparable complexity. Thus deep differences in the 
basic organizational principles are probably present. 

Some similarities, in addition to the one referred to above, are nevertheless un¬ 
deniable. The nerves are bundles of fibers — like our bundles. The nervous system 
contains numerous “neural pools” whose function may well be that of organs devoted 
to the restoring of excitation levels. (At least of the two (extreme) levels, e.g. one 
near to 0 and one near to 1, as in the case discussed in Section 9 (p. 30), especially in 

9.2.2 (p. 31), and 9.2.3 (p. 31), 9.4.2 (p. 34). Restoring one level only — by exciting or 
quenching or establishing some intermediate stationary level — destroys rather than 
restores information, since a system with a single stable state has a memory capacity 
0 (cf. the definition given in 5.2. (p. 15)). For systems which can stabilize (i.e. restore) 
more than two excitation levels, cf. 12.6. (p. 55)). 

11.2 Remarks concerning the concept of a random 
permutation. 

The second remark on the subject of multiplexed systems concerns the problem (which 
was so carefully sidestepped in Section 9 (p. 30)) of maintaining randomness of stim¬ 
ulation. For all statistical analyses, it is necessary to assume that this randomness 
exists. In networks which allow feedback, however, when a pulse from an organ gets 
back to the same organ at some later time, there is danger of strong statistical corre¬ 
lation. Moreover without randomness, situations may arise where errors tend to be 
amplified instead of canceled out. E.g. it is possible, that the machine remembers 
its mistakes, so to speak, and thereafter perpetuates them. A simplified example 
of this effect is furnished by the elementary memory organ of Figure 16 (p. 14), or 
by a similar one, based on the Scheffer stroke, shown in Figure 39 (p. 48). We will 
discuss the latter. This system, provided it makes no mistakes, fires on alternate 
moments of time. Thus it has two possible states: Either it fires at even times 



or at odd times. (For a quantitative discussion of Figure 16 (p. 14), cf. 5.1. (p. 14)). 
However, once the mechanism makes a mistake, i.e. if it fails to fire at the right parity, 
or if it fires at the wrong parity, that error will be remembered, i.e. the parity is now 
lastingly altered, until there occurs a new mistake. A single mistake thus destroys 
the memory of this particular machine for all earlier events. In multiplex systems, 
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single errors are not necessarily disastrous: but without the “random” permutations 
introduced in Section 9 (p. 30), accumulated mistakes can be still dangerous. 

To be more specific: consider the network shown in Figure 35 (p. 36), but without 
the line-permuting “black box” IA. If each output line is now fed back into its input 
line (i.e. into the one with the same number from above), then pulses return to the 
identical organ from which they started, and so the whole organ is in fact a sum of 
separate organs according to Figure 39 (p. 48), and hence it is just as subject to error 
as a single one of those organs acting independently. However, if a permutation of 
the bundle is interposed as shown, in principle, by U in Figure 35 (p. 36), then the 
accuracy of the system may be (statistically) improved. This is, of course, the trait 
which is being looked for by the insertion of W, i.e. of a “random” permutation in the 
sense of Section 9 (p. 30). But how is it possible to perform a “random” permutation? 

The problem is not immediately rigorously defined. It is, however, quite proper to 
reinterpret it as a problem that can be stated in a rigorous form, namely: it is desired 
to find one or more permutations which can be used in the “blackboxes” marked 
with U or U\. U 2 in the relevant Figures 35 (p.36), 37 (p. 36), so that the essential 
statistical properties that are asserted there are truly present. Let us consider the 
simpler one of these two, i.e. the multiplexed version of the simple memory organ of 
Figure 39 (p. 48) — i.e. a specific embodiment of Figure 35 (p.36). The discussion 
given in 10.3 (p. 41) shows that it is desirable, that the permutation U of Figure 35 
(p. 36) permute the even lines among each other, and the odd lines among each other. 
A possible rigorous variant of the question that should now be asked is this. 

Find a fiduciary level A > 0 and a probability e > 0, such that for any 77 > 0 
and any s = 1 , 2 ,... there exists an A/" — s) and a permutation U = U^\ 
satisfying the following requirement: assume that the probability of error in a single 
operation of any given Scheffer organ is e. Assume that at the time t all lines of the 
above network are stimulated, or that all are not stimulated. Then the number of 
lines stimulated at the time t + s will be > (1 — A)A f or < AAf, respectively, with 
a probability > 1 — <5. In addition Af(r/,s) < C\w(s/r]) with a constant C (which 
should not be excessively great). 

Note, that the results of Section 10 (p. 37) make the surmise seem plausible, that 
A = ,07,e = .005 and C ~ 10, 000/[8.6 x In 10] ~ 500 are suitable choices for the 
above purpose. 

The following surmise concerning the nature of the permutation IA^ has a certain 
plausibility: let M = 2 l . Consider the 2 l complexes (di, cfe, • • •, d/) ( d\ = 0,1, for A = 
1,..., l). Let these correspond in some one to one way to the 2 l integers i — 1,..., M : 

(28) i yy (di, cfo, • • •, d/) 

Now let the mapping 

(29) i 1 = U^i 

be induced, under the correspondence (28), by the mapping 

(30) (di, d 2 ,..., di) —* (dj, di,..., d/_ 1 ). 
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Obviously, the validity of our assertion is independent of the choice of the correspon¬ 
dence (28). Now (30) does not change the parity of d\, hence the desideratum 

that i.e. (29), should not change the parity of i (cf. above) is certainly fulfilled, 
if the correspondence (28) is so chosen as to let i have the same parity as Yl\=i A- 
This is clearly possible, since on either side each parity occurs precisely 2^ _1 times. 
This should fulfill the above requirements. 

11.3 Remarks concerning the simplified probability 
assumption. 

The third remark on multiplexed automata concerns the assumption made in 
defining the unreliability of an individual neuron. It was assumed that the probability 
of the neuron failing to react correctly was a constant e, independent of time and of 
all previous inputs. This is an unrealistic assumption. For example, the probability 
of failure for the Scheffer organ of Figure 12 (p. 12) may well be different when the 
inputs a and f3 are both stimulated, from the probability of failure when a and not 
(3 is stimulated. In addition, these probabilities may change with previous history, or 
simply with time and other environmental conditions. Also, they are quite likely to 
be different from neuron to neuron. Attacking the problem with these more realistic 
assumptions means finding the domains of operability of individual neurons, finding 
the intersection of these domains (even when drift with time is allowed) and finally 
carrying out the statistical estimates for this far more complicated situation. This 
will not be attempted here. 


12. ANALOG POSSIBILITIES. 

12.1 Further remarks concerning analog procedures. 

There is no valid reason for thinking that the system which has been developed in 
the past pages is the only or the best model of any existing nervous system or of any 
potential error-safe computing machine or logical machine. Indeed, the form of our 
model-system is due largely to the influence of the techniques developed for digital 
computing and to the trends of the last sixty years in mathematical logics. Now, 
speaking specifically of the human nervous system this is an enormous mechanism — 
at least 10 6 times larger than any artifact with which we are familiar — and its activ¬ 
ities are correspondingly varied and complex. Its duties include the interpretation of 
external sensory stimuli, of reports of physical and chemical conditions, the control 
of motor activities and of internal chemical levels, the memory function with its very 
complicated procedures for the transformation of and the search for information, and 
of course, the continuous relaying of coded orders and of more or less quantitative 
messages. It is possible to handle all these processes by digital methods (i.e. by using 
numbers and expressing them in the binary system — or, with some additional cod¬ 
ing tricks, in the decimal or some other system), and to process the digitalized, and 
usually numericized, information by algebraical (i.e. basically arithmetical) methods. 
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This is probably the way how a human designer would at present approach such 
a problem. It was pointed out in the discussion in 11.1. (p. 47), that the available 
evidence, though scanty and inadequate, rather tends to indicate that the human 
nervous system uses different principles and procedures. Thus message pulse trains 
seem to convey meaning by certain analogic traits (within the pulse notation — i.e. 
this seems to be a mixed, part digital, part analog system), like the time density of 
pulses in one line, correlations of the pulse time series between different lines in a 
bundle, etc. 

Hence our multiplexed system might come to resemble the basic traits of the 
human nervous system more closely, if we attenuated its rigidly discrete and digital 
character in some respects. The simplest step in this direction, which is rather directly 
suggested by the above remarks about the human nervous system, would seem to be 
this. 

12.2 A possible analog procedure. 

12.2.1 The set up. 

In our prototype network 1Z each line carries a “yes” (i.e. stimulation) or a “no” 
(i.e. non-stimulation) message — these are interpreted as digits 1 and 0, respectively. 
Correspondingly, in the final (multiplexed) network 77.A) (which is derived from 
7Z) each bundle carries a “yes” = 1 (i.e. prevalent stimulation) or a “no” = 0 (i.e. 
prevalent non-stimulation) message. Thus only two meaningful states, i.e. average 
levels of excitation are allowed for a bundle — actually for one of these £ ~ 1 and for 
the other £ ~ 0. 

Now for large bundle sizes A f the average excitation level, £, is an approximately 
continuous quantity (in the interval 0 < £ < 1) — the larger AT, the better the 
approximation. It is therefore not unreasonable to try to evolve a system in which £ 
is treated as a continuous quantity in o<e<i- This means an analog procedure (or 
rather, in the sense discussed above, a mixed, part digital, part analog procedure). 
The possibility of developing such a system depends, of course, on hireling suitable 
algebraic procedures that fit into it, and being able to assure its stability in the 
mathematical sense (i.e. adequate precision) and in the logical sense (i.e. adequate 
control of errors). To this subject we will now devote a few remarks. 


12.2.2 The operations. 

Consider a multiplex automaton of the type which has just been considered in 
12.2.1 (p. 51), with bundle size J\f. Let £ denote the level of excitation of the bundle 
at any point, that is, the relative number of excited lines. With this interpretation, 
the automaton is a mechanism which performs certain numerical operations on a 
set of numbers to give a new number (or numbers). This method of interpreting a 
computer has some advantages, as well as some disadvantages in comparison with the 
digital, “all or nothing,” interpretation. The conspicuous advantage is that such an 
interpretation allows the machine to carry more information with fewer components 
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than a corresponding digital automaton. A second advantage is that it is very easy to 
construct an automaton which will perform the elementary operations of arithmetics. 
(Or, to be more precise: an adequate subset of these. Cf. the discussion in 12.3 
(p.52)). For example, given £ and 77 it is possible to obtain ^(£ + rj) as shown in 
Figure 40 (p. 53). Similarly, it is possible to obtain cr£ + (1 — a)p for any constant a 
with 0 < a < 1. (Of course, there must be a = M/N , M — 0,1,..., N, but this range 
for a is the same “approximate continuum” as that one for £, hence we may treat the 
former as a continuum just as properly as the latter.) We need only choose aAf lines 
from the first bundle and combine them with (1 — a)Af lines from the second. To 
obtain the quantity 1 — £77 requires the following set-up shown in Figure 41 (p.53). 
Finally we can produce any constant excitation level a (0 < a < 1), by originating a 
bundle so that aAf lines come from a live source and (1 — a)Af from ground. 

12.3 Discussion of the algebraical calculus resulting from the 
above operations. 

Thus our present analog system can be used to build up a system of algebra where 
the fundamental operations are 


(31) 


a£ + (1 — a)r/ 
1 


(for any constant a in 

0 < a < 1), 


All these are to be viewed as functions of £, 77 . They lead to a system, in which one can 
operate freely with all those functions /(£i, £ 2 , • • •, £fc) °f an y k variables £ 1 , £ 2 , • • •, £fc, 
that the functions of (31) generate. I.e. with all functions that can be obtained by 
any succession of the following processes: 

(A) In the functions of (31) replace £,77 by any variables £p£j. 

(B) In a function /(£*,...,£*), that has already been obtained, replace the vari¬ 
ables /(£b ...,£*), by any functions 01 (£ 1 ,..., £ fc ), • • •, ^(£ 1 , ■ ■ ■, £*,), respec¬ 
tively, that have already been obtained.* 

To these, purely algebraical-combinatorial processes we add a properly analytical one, 
which seems justified, since we have been dealing with approximative procedures, 
anyway: 

(C) If a sequence of functions / /i (£i,..., £*.), fj = 1, 2 ,..., that have already been 
obtained, converges uniformly (in the domain 0 < £1 < 1 ,..., 0 < £& < 1) for 
H 00 to /(£ 1 , • ■ ■, £fc), then form this /(£ 1 , ...,£*,). 


* ed: This expression was evidentially meant to show the segmentation of the func¬ 
tion into successive blocks of length l. 
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Note, that in order to have the freedom of operation as expressed by (A), (B), 
the same “randomness” conditions rnnst be postulated as in the corresponding parts 
of Sections 9 (p. 30) and 10 (p. 37). Hence “randomizing” permutations U must be 
interposed between consecutive executive organs (i.e. those described above and re¬ 
enumerated in (A)), just as in the Sections referred to above. 

In ordinary algebra the basic functions are different ones, namely: 

(for any constant a in ] 

0 < a; < 1), J 


f“ 1 

(32) U + V> 

U J 


It is easily seen that the system (31) can be generated (by (A), (B)) from the 
system (32), while the reverse is not obvious (not even with (C) added). In fact (31) 
is intrinsically more special than (32), i.e. the functions that (31) generates are fewer 
than those that (32) generates (this is true for (A), (B), and also for (A), (B), (C)) 
— the former do not even include £ + r]. Indeed all functions of (31), i.e. of (A) based 
on (31), have this property: If all variables lie in the interval 0 < £ < 1, then the 
function, too, lies in that interval. This property is conserved under the applications 
of (B), (C). On the other hand £ + rj does not possess this property — hence it 
cannot be generated by (A), (B), (C) from (31). (Note that the above property of the 
functions of (31), and of all those that they generate, is a quite natural one: They 
are all dealing with excitation levels, and excitation levels must, by their nature, be 
numbers £ with 0 < £ < 1.) 

In spite of this limitation, which seems to mark it as essentially narrower than 
conventional algebra, the system of functions generated (by (A), (B), (C)) from (31) is 
broad enough for all reasonable purposes. Indeed, it can be shown that the functions 
so generated comprise precisely the following class of functions: 

All functions /(£i, £ 2 , • • •, £&) which, as long as their variables £ 1 , £ 2 , • • •, £,k lie in 
the interval 0 < £ < 1, are continuous and have their value lying in that interval, too. 

We will not give the proof here, it runs along quite conventional lines. 
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12.4 Limitations of this system. 

This result makes it clear that the above analog system, i.e. the system of (31), 
guarantees for numbers £ with 0 < £ < 1 , (i.e. for the numbers that it deals with, 
namely excitation levels) the full freedom of algebra and of analysis. 

In view of these facts, this analog system would seem to have clear superiority 
over the digital one. Unfortunately, the difficulty of maintaining accuracy levels 
counterbalances the advantages to a large extent. The accuracy can never be expected 
to exceed 1/A/". In other words, there is an intrinsic noise level of the order 1/A/", i.e. 
for the A f considered in 10.5.2 (p. 43) and 10.5.3 (p. 45) (up to ~ 20,000) at best 10 -4 . 
Moreover, in its effects on the operations of (31), this noise level rises from 1/A/" to 
1/y/M. (E.g. for the operation 1 — ^ 77 , cf. the result (21) and the argument that leads 
to it.) With the above assumptions, this is at best, rs-/ 10 2 , i.e. 1%! Hence after 
a moderate number of operations the excitation levels are more likely to resemble a 
random sampling of numbers than mathematics. 

It should be emphasized, however, that this is not a conclusive argument that 
the human nervous system does not utilize the analog system. As was pointed out 
earlier, it is in fact known for at least some nervous processes that they are of an 
analog nature, and that the explanation of this may, at least in part, lie in the fact 
that the “logical depth” of the nervous network is quite shallow in some relevant 
places. To be more specific: the number of synapses of neurons from the peripheral 
sensory organs down the afferent nerve fibers, through the brain, back through the 
efferent nerves to the motor system may not be more than ~ 10. Of course the 
parallel complexity of the network of neurons is indisputable. “Depth” introduced by 
feedback in the human brain may be overcome by some kind of self-stabilization. At 
the same time, a good argument can be put up that the animal nervous system uses 
analog methods (as they are interpreted above) only in the crudest way, accuracy 
being a very minor consideration. 

12.5 A plausible analog mechanism: Density modulation by 
fatigue. 

Two more remarks should be made at this point. The first one deals with some 
more specific aspects of the analog element in the organization and functioning of 
the human nervous system. The second relates to the possibility of stabilizing the 
precision level of the analog procedure that was outlined above. 

This is the first remark. As we have mentioned earlier, many neurons of the 
nervous system transmit intensities (i.e. quantitative data) by analog methods, but, 
in a way entirely different from the method described in 12.2, 12.3 and 12.4 (p. 51-54). 
Instead of the level of excitation of a nerve (i.e. of a bundle of nerve fibers) varying, 
as described in 12.2 (p. 51), the single nerve fibers fire repet it iously, but with varying 
frequency in time. For example, the nerves transmitting a pressure stimulus may 
vary in frequency between, say, 6 firings per second and, say, 60 firings per second. 
This frequency is a monotone function of the pressure. Another example is the optic 
nerve, where a certain set of fibers responds in a similar manner to the intensity of 
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the incoming light. This kind of behavior is explained by the mechanism of neuron 
operation, and in particular with the phenomena of threshold and of fatigue. With 
any peripheral neuron at any time can be associated a threshold intensity: A stimulus 
will make the neuron fire if and only if its magnitude exceeds the threshold intensity. 
The behavior of the threshold intensity as a function of the time after a typical 
neuron fires is qualitatively pictured in Figure 42 (p. 55). After bring, there is an 
“absolute refractory period” of about 5 milliseconds, during which no stimulus can 
make the neuron bre again. During this period, the threshold value is inhnite. Next 
comes a “relative refractory period” of about 10 milliseconds, during which time the 
threshold level drops back to its equilibrium value (it may even oscillate about this 
value a few times at the end). This decrease is for the most part monotonic. Now 
the nerve will bre again as soon as it is stimulated with an intensity greater than 
its excitation threshold. Thus if the neuron is subjected to continual excitation of 
constant intensity (above the equilibrium intensity), it will bre periodically with a 
period between 5 and 15 milliseconds, depending on the intensity of the stimulus. 


Normal 



Fig, 


42 


Another interesting example of a nerve network which transmits intensity by 
this means is the human acoustic system. The ear analyzes a sound wave into its 
component frequencies. These are transmitted to the brain through different nerve 
bbers with the intensity variations of the corresponding component represented by 
the frequency modulation of nerve bring. 

The chief purpose of all this discussion of nervous systems is to point up the 
fact that it is dangerous to identify the real physical (or biological) world with the 
models which are constructed to explain it. The problem of understanding the animal 
nervous action is far deeper than the problem of understanding the mechanism of a 
computing machine. Even plausible explanations of nervous reaction should be taken 
with a very large grain of salt. 

12.6 Stabilization of the analog system. 

We now come to the second remark. It was pointed out earlier, that the analog 
mechanism that we discussed may have a way of stabilizing excitation levels to a 
certain precision for its computing operations. This can be done in the following way. 

For the digital computer, the problem was to stabilize the excitation level at (or 
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near) the two values 0 and 1. This was accomplished by repeatedly passing the bundle 
through a simple mechanism which changed an excitation level £ into the level /(£) 
where the function /(£) had the general form shown in Figure 43 (p. 56). The reason 
that such a mechanism is a restoring organ for the excitation levels £ ~ 0 and £ ~ 1 
(i.e. that it stabilizes at — or near — 0 and 1) is that /(£) has this property: For 
some suitable /3(0</3<l)0<£</3 implies 0</(£)<£;/3<£<l implies 
£ < /(£) < 1. Thus £ = 0,1 are the only stable hxpoints of /(£). (Cf. the discussion 
in 9.2.3 (p. 31) and 9.4.2 (p. 34U 



Now consider another /(£) which has the form shown in Figure 44 (p. 56). I.e. 
we have: 

0 = ao < /3i < a\ < ... < ot v -\ < f3 v < a v — 1, 

for i — 1,... ,u : ai-i <£</?* implies cti-i < /(£) < £, 

A < £ < Oil implies £</(£)<«*. 

Here cro(= 0), cti,..., a^-i, Oi v {— 1) are /(£)’s only stable hxpoints, and such a 
mechanism is a restoring organ for the excitation levels £ rv_/ «o(= 0),«i, .. .,a u - 1 , 
a u (= 1). Choose, e.g. oti = i/v (i = 0,1,..., u), with zz” 1 < /3, or more generally, 
just ojj — oti -1 < 5 (z = 1,..., u) with some suitable v. Then this restoring organ 
clearly conserves precisions of the order 5 (with the same prevalent probability with 
which it restores). 


13. CONCLUDING REMARK. 

13.1 A possible neurological interpretation. 

There remains the question, whether such a mechanism is possible, with the means 
that we are now envisaging. We have seen further above, that this is the case, if a 
function /(£) with the properties just described can be generated from (31). Such 
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a function can indeed be so generated. Indeed, this follows immediately from the 
general characterization of the class of functions that can be generated from (31), 
discussed in 12.3 (p. 52). However, we will not go here into this matter any further. 

It is not inconceivable that some “neural pools” in the human nervous system 
may be such restoring organs, to maintain accuracy in those parts of the network 
where the analog principle is used, and where there is enough “logical depth” (cf. 
12.4 (p.54)) to make this type of stabilization necessary. 
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